This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

www.elsevier.com/locate/fss

**Neural networks for soft decision making
**

Hisao Ishibuchi ∗ , Manabu Nii

Department of Industrial Engineering, Osaka Prefecture University, Gakuen-cho 1-1, Sakai, Osaka 599-8531, Japan Received November 1998

Abstract This paper discusses various techniques for soft decision making by neural networks. Decision making problems are described as choosing an action from possible alternatives using available information. In the context of soft decision making, a single action is not always chosen. When it is di cult to choose a single action based on available information, the decision is withheld or a set of promising actions is presented to human users. The ability to handle uncertain information is also required in soft decision making. In this paper, we handle decision making as a classiÿcation problem where an input pattern is classiÿed as one of given classes. Class labels in the classiÿcation problem correspond to alternative actions in decision making. In this paper, neural networks are used as classiÿcation systems, which eventually could be implemented as a part of decision making systems. First we focus on soft decision making by trained neural networks. We assume that the learning of a neural network has already been completed. When a new pattern cannot be classiÿed as a single class with high certainty by the trained neural network, the classiÿcation of such a new pattern is rejected. After brie y describing rejection methods based on crisp outputs from the trained neural network, we propose an interval-arithmetic-based rejection method with interval input vectors, and extend it to the case of fuzzy input vectors. Next we describe the learning of neural networks for possibility analysis. The aim of possibility analysis is to present a set of possible classes of a new pattern to human users. Then we describe the learning of neural networks from training patterns with uncertainty. Such training patterns are denoted by interval vectors and fuzzy vectors. Finally we examine the performance of various soft decision making methods described in this paper by computer simulations on commonly used data sets in the literature. c 2000 Elsevier Science B.V. All rights reserved. Keywords: Neural networks; Soft decision; Reject option; Interval arithmetic; Fuzzy arithmetic

1. Introduction Decision making is described as choosing an action from possible alternatives based on input information. We assume that the input information is denoted by an n-dimensional real vector

∗ Corresponding author. Tel.: +81-722-54-9350; fax: +81-72254-9915. E-mail address: hisaoi@ie.osakafu-u.ac.jp (H. Ishibuchi).

x = (x1 ; x2 ; : : : ; xn ) where x ∈ n . We also assume that we have c alternative actions, which are denoted by = {!1 ; !2 ; : : : ; !c }. In this case, our decision making problem is to choose a single action from the c alternatives = {!1 ; !2 ; : : : ; !c } based on the input vector x = (x1 ; x2 ; : : : ; xn ). For designing a decision support system, we assume that we have m cases in our database. That is, we have m pairs of previous input vectors and corresponding actions chosen by human experts. We

0165-0114/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 0 1 1 4 ( 9 9 ) 0 0 0 2 2 - 6

122

H. Ishibuchi, M. Nii / Fuzzy Sets and Systems 115 (2000) 121–140

denote those pairs as (xp ; !(xp )); p = 1; 2; : : : ; m where xp = (xp1 ; xp2 ; : : : ; xpn ) is an n-dimensional vector and !(xp ) is one of the c alternative actions (i.e., !(xp ) ∈ {!1 ; !2 ; : : : ; !c }). Thus our problem is to design a decision support system using the given data set (xp ; !(xp )); p = 1; 2; : : : ; m. In the context of hard decision making, our problem is described as dividing the n-dimensional input space n into c disjoint decision regions R1 ; R2 ; : : : ; Rc where R1 ∪ R2 ∪ · · · ∪ Rc = n and Rj ∩ Rk = ∅ for j = k. Since the data set (xp ; !(xp )); p = 1; 2; : : : ; m is obtained as a result of human decision making, it may include inappropriate actions (i.e., !(xp ) is not always an appropriate action). It may include uncertain and missing values. The available information on a new case x = (x1 ; x2 ; : : : ; xn ) may also include uncertain and missing values. Thus hard decision making is not always the best strategy. Hard decision making may lead to the choice of an inappropriate action near the decision boundary in the input space. In the context of soft decision making of this paper, the choice of a single action is withheld when the input vector is near the decision boundary. Exceptional handling such as additional manual inspection will be applied to such input vectors. In this paper, we use a multi-layer feedforward neural network [23] as a decision support system for soft decision making. Our neural network is supposed to withhold the decision making when the choice of a single action is di cult based on available information. In this case, a set of promising actions is shown by the neural network to human users. In the context of soft decision making, the neural network also requires the ability to handle uncertain information because the given data set as well as the available information on a new case may include uncertain and missing values. Uncertain information is denoted by interval vectors and fuzzy vectors. Interval vectors and fuzzy vectors are used as inputs to the neural network. The option to withhold the decision making corresponds to the reject option in classiÿcation problems. So we handle decision making as a classiÿcation problem, in which x = (x1 ; x2 ; : : : ; xn ) and = {!1 ; !2 ; : : : ; !c } are viewed as an n-dimensional pattern vector in the pattern space n and the set of class labels, respectively. In a classiÿcation system with the reject option, a new pattern is not always classiÿed as one of the c classes. Its classiÿcation

may be rejected when it cannot be assigned to a single class with high certainty. Chow [5] showed the optimum rejection rule for statistical pattern classiÿcation where the probability of pattern vectors from each class is given. Chow [6] discussed the tradeo between the error rate and the rejection rate. The reject option was also discussed for nearest neighbor classiÿcation [3], fuzzy classiÿcation [9,16], and neural networks [2,7,12,17,19,21]. The selection of multiple classes for a rejected pattern is referred to as “class-selective rejection” in Ha [11]. In this paper, we ÿrst discuss the rejection of the classiÿcation of a new pattern by a trained neural network. We assume that the trained neural network is given. The classiÿcation of the new pattern is performed by presenting it to the trained neural network and calculating the corresponding output vector. We mention two rejection methods based on the output vector from the trained neural network: One method is based on the largest output value, and the other on the di erence between the largest and the second largest output value. These rejection methods were used in Cordella et al. [7]. We propose rejection methods for interval input vectors and fuzzy input vectors, which represent uncertain information on new patterns. Next we describe the learning of neural networks for possibility analysis. A neural network is trained to identify the possible area of each class in the pattern space using a learning algorithm in Ishibuchi et al. [12,17,19]. The trained neural network can present a set of possible classes to human users for a new pattern. That is, the trained neural network can perform the classselective rejection. Then we describe the learning of neural networks by interval input vectors and fuzzy input vectors. A learning algorithm by interval input vectors was proposed in Ishibuchi and Tanaka [18], and extended to the case of fuzzy input vectors in Ishibuchi et al. [13,14]. It is shown that those learning algorithms are e ective not only to handle uncertain information but also to avoid the overÿtting of neural networks to training data. Finally we examine the performance of various soft decision making methods described in this paper by computer simulations on commonly used test problems. We use iris data in Fisher [8] and Australian credit approval data in Quinlan [22]. By computer simulations, we show that the reject option is e ective for decreasing error rates. It is also shown

where wji and wkj are connection weights. m) do not include uncertain or missing values. the one-dimensional input space is classiÿed by the largest output value. Our neural network is the standard back-propagation network in Rumelhart et al. xpn ) and !(xp ) ∈ {!1 . : : : . We illustrate the classiÿcation by a single winner output unit in Fig. !(xp )). 1. o2 . oc ) is the output vector from the trained neural network corresponding to the input vector x = (x1 . : : : . n. (1) (xp . (3) wkj · oj + Âk . we use the sigmoid function f(x) = 1={1 + exp(−x)} as in Rumelhart et al. (2) Output unit k: ok = f nH j=1 k = 1. m labeled patterns where o = (o1 . : : : . 2. Our classiÿcation problem is to design a classiÿcation system using the m labeled patterns (xp . which can be arbitrarily speciÿed. [23]. Hard decision making. o2 (x) and o3 (x) are output values from the ÿrst. 2. a new case x = (x1 . the classiÿcation . a new pattern x = (x1 . : : : . !c } are viewed as class labels.H. we describe neural-network-based soft decision making methods. c (k = l). oi = xi . Using a three-layer feedforward neural network with n input units and c output units. In the context of hard decision making. we also discuss soft decision making as a classiÿcation problem. we assume that the learning of the neural network has already been completed. : : : . : : : . j = 1.e. !2 . !(xp )). The learning of neural networks for soft decision making will be discussed in Sections 5–7. 2. in which the c alternative actions = {!1 . xn ). xn ) is presented to the neural network: Input unit i: Hidden unit j: n Fig. 1 where o1 (x). The neural network can be trained by the standard back-propagation algorithm [23] when the training data (i. !c }.e. x2 . As the activation function f(·) at the hidden and output units. : : : . and Âj and Âk are biases. nH . Ishibuchi. One method is based on the largest output ol in (4).. : : : . Thus our problem is to classify a new pattern (i.. When the largest output ol is smaller than a prespeciÿed threshold value (say ÿmax ).. In this section (also in Sections 3 and 4). In Fig. : : : . x2 . xp2 . !2 . Cordella et al. 2. : : : . Nii / Fuzzy Sets and Systems 115 (2000) 121–140 123 that the classiÿcation performance of neural networks can be improved by using interval input vectors and fuzzy input vectors in the classiÿcation phase as well as in the learning phase. c. p = 1. [7] employed two rejection methods. 2. x2 . 2.e. When this inequality condition holds. for decreasing error rates). 2. [23]. (4) oj = f i=1 wji · oi + Âj . p = 1. 1. : : : . xn ) is usually classiÿed by a single winner output unit. The number of hidden units. For improving the classiÿcation reliability of neural networks (i. x2 . i = 1. x is classiÿed as class !l . Reject options based on output values Since many soft decision making methods have been proposed for classiÿcation problems in the literature. m where xp =(xp1 . The lth output unit is the winner when the following inequality condition holds: ok ¡ol for k = 1. The input–output relation of each unit is written as follows when the n-dimensional input vector x = (x1 . M. : : : . : : : . : : : . second and third output units. respectively. is denoted by nH . xn ) in the decision making problem) by the trained neural network.

we present X to the trained neural network. We assume that the available information on a new case x = (x1 . The other rejection method in Cordella et al. When the di erence (o∗ − o∗∗ ) is smaller than a pre-speciÿed threshold value (say ÿdi erence ). : : : . : : : . 0:8]. X2 . Fig. 2. : : : . M. x2 . In this section. : : : .8. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 and the second largest output o∗∗ . 2. Xn ) by the trained neural network. Rejection method based on the largest output. x2 . Our problem in this section is to classify the interval vector X = (X1 . (7) of the input vector x = (x1 . xn ) is rejected as follows: Rejection method 1: Reject the classiÿcation of x if o∗ = max{o1 . we assumed that no uncertainty was included in the input vector x = (x1 . the classiÿcation of the input vector x = (x1 . Hidden unit j: n i = 1. X2 . 3 where the input x is classiÿed only when the di erence between the largest output and the second largest output is not smaller than the threshold value ÿdi erence . For classifying the interval vector X = (X1 . Xn ). 3. j = 1. [7] is based on the di erence between the largest output o∗ Output unit k: Ok = f nH j=1 k = 1. oc }¡ÿmax : (5) Oj = f i=1 wji · Oi + Âj . : : : . : : : . xn ) is rejected as follows: Rejection method 2: Reject the classiÿcation of x if o∗ − o∗∗ ¡ÿdi erence : (6) This rejection method is illustrated in Fig. this information is denoted by the interval X1 = [0:2. : : : . Reject options based on interval outputs Fig. : : : . In the previous section. 2 where the input x is classiÿed only when the largest output is not smaller than the threshold value ÿmax . when we do not know the exact value of x1 but we know that x1 is between 0. 2. (9) wkj · Oj + Âk. (8) This rejection method is illustrated in Fig. o2 . 3. xn ) in the decision making problem is represented by an interval vector X = (X1 . Xn ). x2 . c.2 and 0. : : : . 2. Rejection method based on the di erence between the largest output and the second largest output. Each interval Xi denotes the range of possible values of xi . xn ) to be classiÿed by the trained neural network. x2 . we discuss the classiÿcation of uncertain input vectors.124 H. n. nH . For example. : : : . : : : . Ishibuchi. The input–output relation of each unit can be extended to the case of the interval vector as follows [18]: Input unit i: Oi = Xi . X2 . .

When the lth output unit satisÿes the inequality condition in (10) for the interval vector X = (X1 . X2 . 4 is [0. [0. Using a simple numerical example. : : : . 1]. Ok ). O2 . Training patterns and the class boundary generated by the trained neural network. Âj . c (k = l): (10) Fig. 0. That is. Ishibuchi. First we trained a three-layer feedforward neural network with two input units. Xn ).H..e. 4.4. the interval output vector O = (O1 . Oc ) is calculated by applying interval arithmetic to (7) –(9). : : : . This deÿnition can be rewritten using the lower and upper limits of the intervals as follows: Ok ¡Ol ⇔ oU ¡oL . .e. ÿve hidden units and three output units by the standard back-propagation algorithm [23]. Interval vector ([0. 2. we know that the range of possible values of the second input x2 of the new pattern is [0. xU ] = {x | xL 6x6xU . M. We denote intervals by their lower and upper limits as X = [xL . k l (12) where oU and oL are the upper limit of the interval l k Ok and the lower limit of the interval Ol . 0:4].4]. wkj . Our problem is to classify new patterns with uncertainty by the trained neural network. the classiÿcation of X is rejected when no output unit satisÿes the inequality condition in (10). O2 . : : : . X2 . x ∈ }: (13) Fig. 5. Oi . On the other hand. : : : . We use the following deÿnition of the inequality between intervals: Ok ¡Ol ⇔ ok ¡ol for ∀ok ∈ Ok and ∀ol ∈ Ol : (11) The inequality Ok ¡Ol between intervals holds only when the corresponding inequality ok ¡ol holds for all real numbers in the intervals. Oj . We assume that we have 45 labeled patterns in the two-dimensional input space [0.. 1] × [0. Âk ) and uppercase letters (i. : : : . X is classiÿed as Class !l . respectively. 1]). Xi . respectively. 4. 1] × [0. For classifying the new pattern. 1]). Xn ). For classifying the interval vector X = (X1 . By directly extending the inequality condition in (4) to the case of the interval output vector. we have to specify a single winner output unit for the interval output vector O = (O1 . The class boundary generated by the trained neural network is shown in Fig. This interval vector is depicted in Fig. Since the input space of the classiÿcation problem in Fig. 4. The calculation of the input–output relation is performed by interval arithmetic [1]. 1] as shown in Fig. 5. [0. 1]. we have the following condition: Ok ¡Ol for k = 1. Oc ). Let us assume that the second input value of a new pattern is missing while its ÿrst input value is known as x1 = 0:4. we illustrate the classiÿcation of interval vectors. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 125 where real numbers and intervals are denoted by lowercase letters (i. X2 ) = ([0:4. Thus we can denote the information on the new pattern by the interval vector X = (X1 . wji .

7 was rejected. O3 ) is shown in Fig. 6. 0. 0:9].9]. From Fig. [0. Available information on the new pattern is that x1 is in the interval [0:75. X2 ) = ([0:75. the interval vector X is classiÿed as that class. This interval vector is depicted in Fig. O2 . [0:4. The inequality Fig. If the inequality Ok ¡Ol holds for interval outputs Ok and Ol . include Class !k in the set of possible classes. Examine the inclusion of each class in the set of possible classes in this manner. 0:9]. For classifying the new pattern. O2 . 0:7]). In the case of Figs. we presented the interval vector to the trained neural network. Thus we can classify the new pattern as Class !1 while its second input is missing. That is. we presented the interval vector to the trained neural network. While the classiÿcation of the interval input X = (X1 . M. 8. Since no interval output satisÿes the inequality condition in (10). 8.7]). Interval vector ([0. 0:7]) in Fig. X2 ) = ([0:75. 7. the classiÿcation of the new pattern is rejected. O3 ) was calculated as shown in Fig. Otherwise the classiÿcation of X is rejected and the set of possible classes is presented to human users. 7 and 8. Interval output vector corresponding to the interval input vector in Fig. we can see from Fig. Let us consider another new pattern with uncertainty. the interval vector X is not classiÿed as Class !k . 0:9] and x2 is in the interval [0:4. 6. Fig. 0:7]) . 7. between interval outputs is used for performing the following class-selective rejection: Rejection method 3: If no interval output satisÿes the inequality Ok ¡Ol for the interval output Ok . 0:9]. Ishibuchi. X2 ) = ([0:75. 5. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. If the set of possible classes includes only a single class. we can see that the interval output O1 from the ÿrst output unit satisÿes the inequality relation in (10). The corresponding interval output vector O = (O1 . [0:4. 0. Class !1 can be removed from the set of possible classes of the new pattern. 0:7].126 H. The corresponding interval output vector O = (O1 . Interval output vector corresponding to the interval input vector in Fig. Thus we denote the available information on the new pattern by the interval vector X = (X1 . The exact values of the ÿrst input x1 and the second input x2 of the new pattern are unknown.75. 7.4. the classiÿcation of the interval vector X = (X1 . 6. 8 that X is not classiÿed as Class !1 because the interval output O1 is smaller than the other interval outputs O2 and O3 . [0:4.

2. (17) ⇒ ok (x) ∈ Ok (X ) Now let us assume that the interval vector X is classiÿed as Class !l by our rejection method. c From (14) and (15). o2 (x). That is. X2 . : : : . 2. the width of Xi is large. i = 1. Xi = [xi . nH . the classiÿcation result of the exact input vector x is not excluded from the set of possible classes by our rejection method when xi ∈ Xi for i = 1. and Class !2 and Class !3 are presented to human users as possible classes. Ishibuchi. j = 1. In Fig. The input–output relation in (17) – (19) is deÿned by fuzzy arithmetic [20] as in fuzziÿed neural networks [4. From the inclusion monotonicity of interval arithmetic. The numerical calculation of the fuzzy output vector O = (O1 . Nii / Fuzzy Sets and Systems 115 (2000) 121–140 127 is rejected. 2. c: (19) wkj · Oj + Âk. xn ) using uncertain information denoted by an interval vector X = (X1 . o2 (x). 2.15]. X2 . 2. (18) Thus the input vector x is also classiÿed as Class !l . Reject options based on fuzzy outputs In this section. Our problem in this section is to classify the fuzzy vector X = (X1 . the set of possible classes obtained from the available information X with uncertainty always includes the classiÿcation result of the exact input vector x. This means that Class !l is included in the set of possible classes. This rejection method is theoretically supported by a basic feature of interval arithmetic called the inclusion monotonicity [1]. On the other hand. X2 . Oc ) is performed by interval arithmetic on the -cuts (i. In this case. this information is denoted by the fuzzy number X1 = small. 2. : : : . ∀ol ∈ Ol (X ). We assume that each interval Xi shows the possible range of xi (i. O2 (X ). Xn ). The classiÿcation of the fuzzy vector X= (X1 . For example. xn ) is given as a fuzzy vector X = (X1 . the set of possible classes is presented by our rejection method. : : : . We also denote the interval output vector for the interval input vector X = (X1 . 2. c: (14) for k = 1. : : : . : : : .. : : : . Let us assume that the exact input vector x is classiÿed by the trained neural network as Class !l . Oc (X )). X2 . That is. when we do not know the exact value of x1 but we know that x1 is small.. : : : . xi ]). oc (x)). 2.14]: Input unit i: Oi = Xi . : : : . when available information on xi involves large uncertainty. we can see that the following relation holds if xi ∈ Xi for i = 1. : : : . Let us consider the classiÿcation of a new pattern x = (x1 . : : : . : : : . Xn ). Let us denote the output vector from the trained neural network for the input vector x = (x1 . 4.e. When the classiÿcation of the interval vector X is rejected. : : : . xi ∈ Xi ). 2. ok (x)¡ol (x) for k = 1. X2 . The input–output relation of each unit can be written for the fuzzy input vector Xp as follows [13. : : : . : : : . level sets) of the fuzzy . n for k = 1. : : : . we show a typical set of linguistic values. : : : . 9. x2 . the following relation holds between o(x) and O(X ): x i ∈ Xi for i = 1. If the exact value of xi is known. Xn ) as O(X ) = (O1 (X ). the classiÿcation result based on the available information X with uncertainty is the same as that on the complete information x when X can be classiÿed by our rejection method. x2 .e. n.H.. M. and (k = l): (15) k = 1. : : : . 2. : : : . : : : . we assume that available information on a new pattern x = (x1 . c (k = l): (16) Hidden unit j: n Oj = f i=1 wji · Oi + Âj . : : : . In other words. oc (x)): Since ok (x) ∈ Ok (X ) Output unit k: Ok = f nH j=1 k = 1. the following relation holds from the deÿnition of the inequality between intervals in (11): ok ¡ol ∀ok ∈ Ok (X ). x2 . we can see that Ol (X )¡Ok (X ) does not hold when ok (x)¡ol (x). c. n. n. O2 . Xn ) is performed by presenting it to the trained neural network. In this case. xn ) as o(x) = (o1 (x).e. : : : . the inequality relation in (16) holds for the corresponding output vector o(x) = (o1 (x). Xi has no width (i. Xn ) by the trained neural network. : : : .

c (k = l): (21) Fig. the classiÿcation of X is rejected because the fuzzy output vector O = (O1 . Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. The rejection of the classiÿcation is explained by the inequality condition in (21). input vector X = (X1 . x2 ) by the trained neural network in Fig. we have the following rejection method: Rejection method 4: If no fuzzy output satisÿes the inequality [Ok ] ¡[Ol ] for the fuzzy output Ok . Our problem is to clas- sify the fuzzy vector X = (X1 . Xn ) as Class !l when the following condition holds: [Ok ] ¡[Ol ] for k = 1. Xn ). O2 . O2 . : : : . That is. Oc ) is an interval vector. O2 . M: medium. the fuzzy vector X is classiÿed as that class. the inequality condition in (10) can be used for determining the winner output unit for the -cut of O. 10 does not satisfy this condition at the level of = 0:5. if we specify as = 0:9. : : : . 10. M. MS: medium small. the inequality condition in (21) is satisÿed for l = 3 by the fuzzy output vector O = (O1 . Ishibuchi. The classiÿcation result by our rejection method depends on the choice of a value of . O3 ) in Fig. (20) where X˜ (·) denotes the membership function of X : Since the -cut of the fuzzy output vector O = (O1 . Examine the inclusion of each class in the set of possible classes in this manner. First the fuzzy vector X was presented to the trained neural network for calculating the corresponding fuzzy output vector O = (O1 . Membership functions of ÿve linguistic values (S: small. For example. include Class !k in the set of possible classes. In the same manner as in the previous section. 10. : : : . Let us consider the classiÿcation of a new pattern x = (x1 . X2 ) = (large. x ∈ } for 0¡ 61. 4 of the previous section. That is. x2 ) was rejected. If the set of possible classes includes only a single class. We assume that the exact value of each input xi is unknown but linguistic information on xi is given as “x1 is large and x2 is medium” where the membership functions of “large” and “medium” are shown in Fig. O2 . X2 . L: large). 9. The -cut of a fuzzy number X is deÿned as follows: [X ] = {x | ˜ X (x)¿ . 2. ML: medium large. : : : . 10 at the level of = 0:9). Otherwise the classiÿcation of X is rejected and the set of possible classes is presented to human users. X2 .128 H. O3 ) in Fig. O3 ): The fuzzy output vector calculated from (17) – (19) is shown in Fig. . medium) by the trained neural network. Next our rejection method with = 0:5 was applied to the fuzzy output vector. Membership function of each fuzzy output.. the classiÿcation of X is rejected. 9.e. That is. we classify the fuzzy input vector X = (X1 . the classiÿcation of the new pattern x = (x1 . Then Class !2 and Class !3 were obtained as possible classes. the new pattern is classiÿed as Class !3 (i. When no output unit satisÿes the inequality condition in (21).

If no class satisÿes the inequality condition.. c (k = l)} . the small the number of possible classes is. By the above classiÿcation method. Xn ) depends on the speciÿcation of . Xn ) can be classiÿed at a particular level ÿ. 4). Xn ) from the inequality condition in (21) as follows: min = min{ | [Ok ] ¡[Ol ] for ∃l (27) and k = 1. ) is determined from the -cut of the fuzzy output vector O = (O1 . reject the classiÿcation of the fuzzy vector. = 0:01. We use the width of this range (i. X2 ) = (large. In this section. c . From this deÿnition and (23). it always holds for any levels higher than ÿ. 0:02. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 129 As we have explained by the above example. ) = {!2 . medium) is classiÿed as Class !3 for = 0:9 while its classiÿcation is rejected for = 0:5: Let us deÿne for which a fuzzy min as the minimum value of vector can be classiÿed. if the inequality holds for a particular level ÿ. X2 . ) was obtained for the fuzzy vector X = (X1 . ) of X satisÿes the following inclusion relation: (X . Let us denote the set of possible classes determined by our rejection method as (X . X2 ) = (small. the set of possible classes (X . we propose the following classiÿcation method with a certainty grade: Classiÿcation with a certainty grade: Find Class !l that satisÿes the inequality condition in (21) for = 1:0. X2 . Since the fuzzy vector X = (X1 . it can be always classiÿed at any levels higher than ÿ. The value of min is calculated as min = 0:58 from (27) by examining 100 discrete values of (i. O3 ) in Fig.0. 1:00). Oc ). small) is far from the class boundary (see Fig.e. 1:00). O3 ) in Fig. the fuzzy vector X = (X1 .g. the classiÿcation result of the fuzzy vector X = (X1 . medium) is near the class boundary (see Fig. M. (X . In Fig. That is. 0:02.e. we can see that the fuzzy vector can be classiÿed for any values of larger than min . 10 as follows: (X . in the interval [ min . Thus we have the certainty grade 0.. : : : . 10.42 for the classiÿcation of X as !3 . : : : . when = 0:5: (25) (26) (X .. 2. : : : .. medium) using the corresponding fuzzy output vector O = (O1 . : : : . The high certainty grade was obtained because the fuzzy vector X = (X1 . O2 . Otherwise classify the fuzzy vector as Class !l with the certainty grade of (1 − min ). Thus we have the following relation: ¿ÿ and [Ok ]ÿ ¡[Ol ]ÿ ⇒ [Ok ] ¡[Ol ] : (23) That is. min is deÿned for the fuzzy vector X = (X1 .H. we can also see that the set of possible classes (X . ÿ) if ¿ÿ: (24) This means that the larger the value of is. we assumed that a trained neural network had already been given for the classiÿcation of a new pattern. : : : . O2 . X2 ) = (large. 2. 9. 1]). Let us illustrate this classiÿcation method for the fuzzy vector X = (X1 . the certainty grade is not large. 5. the -cut of each fuzzy output Ok satisÿes the following inclusion relation: ¿ÿ ⇒ [Ok ] ⊆ [Ok ]ÿ for k = 1. Therefore we can see that if the fuzzy input vector X = (X1 . a fuzzy vector can be classiÿed by the inequality relation in (21) for a high value of even when its classiÿcation is rejected for a low value of . small) by the same trained neural network. X2 ) = (large. The fuzzy vector X is classiÿed as Class !3 by the inequality condition in (21) with = 1:0. For example. That is. we discuss the learning of neural networks for classiÿcation problems with class overlaps.e. From (23). 4). this fuzzy vector was classiÿed as Class !1 with the certainty grade 1. X2 ) = (large. (22) where the minimization is performed over some discrete values of (e. 11. . We can see from Fig. ) = {!3 } when = 0:9 . 11 that the hard decision making is not where 0¡ 61 and 0¡ÿ61. X2 . 1 – min ) as a certainty grade of the classiÿcation. ). we show an example of such a classiÿcation problem. The membership function of the linguistic value “small ” is given in Fig. : : : . : : : . medium) from the fuzzy output vector O = (O1 . !3 } In general. X2 ) = (small. : : : . In general. Learning for possibility analysis In the previous sections. the fuzzy vector can be classiÿed in a large range of (i. That is. For example. ) ⊆ (X . We also examined the classiÿcation of another fuzzy vector X = (X1 . Ishibuchi. = 0:01. When min is small. O2 .

otherwise. c: (34) ep = k=1 epk . Let us denote the c-dimensional output vector corresponding to the input vector xp as op = (op1 . the above relation in (33) means the following: ok (xp ) ≈ 1 ∀xp from Class !k . Ishibuchi. Classiÿcation problem with a large overlapping area. 11. s is the number of epochs). xpn ) and !(xp ) ∈ {!1 . Nii / Fuzzy Sets and Systems 115 (2000) 121–140 where epk = (tpk − opk )2 =2. Because the target tpk is tpk = 1 for the input vector xp from Class !k . xp2 . the relative importance of the input vectors from Class !k becomes larger and larger at the kth output unit during the learning. the squared error for the kth output unit for the input vector xp is discounted if xp does not belong to Class !k . op2 . : : : . where ok (xp ) denotes the output from the kth output unit for the input vector xp . (31) where s is the number of iterations of the learning algorithm (i. k . 2. : : : . : : : .19]: epk = (tpk −opk )2 =2. In computer simulations of this paper. Let us assume that we have m labeled patterns (xp . (33) k = 1. The target vector tp = (tp1 . the following cost function is used for the input vector xp : c By the decreasing function (s). and (s) is a decreasing function such that 0¡ (s)61 and (s) → 0 when s → ∞. When the input vector xp is presented to the neural network. tpc ) for the output vector op is speciÿed as follows: tpk = 1 0 if xp ∈ Class !k . tp2 . : : : . the output vector is calculated by (1) – (3) in the same manner as the standard back-propagation network [23]. Our problem in this section is to train a neural network for possibility analysis. Let RPos be the possibility area of Class !k . : : : . 2. m from c classes where xp = (xp1 . c. otherwise.17. c : (28) In the standard back-propagation algorithm [23]. (s) · (tpk −opk )2 =2. M.. 2. appropriate in this classiÿcation problem because there is a large overlapping area. k = 1. : : : . c : (30) This cost function is minimized in the learning of the neural network. As a result. : : : . !(xp )) . we use the following decreasing function: (s) = 1 : 1 + (s=2000)3 (32) Fig. !2 . For the possibility analysis. opc ).e. k = 1. if xp ∈ Class !k . : : : . 2. Thus we can expect that the following relation is satisÿed after enough iterations of the learning algorithm derived from the cost function in (31): ok (xp ) ≈ tpk ∀xp from Class !k . p = 1. we deÿne the cost function epk of the kth output unit for the input vector xp as follows [12. nH hidden units and c output units. k = 1. !c }. : : : . we use a three-layer feedforward neural network with n input units. 2.130 H. (29) The possibility area of each class is deÿned by the output value from the corresponding output unit after the learning of the neural network using the cost function in (31). The task of the neural network is to present possible classes of a new pattern to human users. As in the previous sections.

12. xp ∈ RPos k ∀xp from class !k . 13. we also show the possibility area of each class obtained by (35) with ÿPos = 0:5. In Fig. : : : . Hard decision making by the standard back-propagation algorithm. we also depict the hard decision making by the trained neural network. 1] in Fig. 2.. k = 1. Let us consider a three-class classiÿcation problem on a single-dimensional pattern space [0.g. 13. Ishibuchi. 13. First we trained a three-layer feedforward neural network with a single input unit. 14. (36) Fig. respectively.25 and 0. M. In Fig. In Fig. c: This means that all the given patterns from each class are covered by the corresponding possibility area in the pattern space. we . 14. ÿPos = 0:5). 12. From this ÿgure. we depict the output from each output unit after 10 000 iterations of the learning using the standard back-propagation algorithm. the following relation holds from (35).H.9. We illustrate the learning for the possibility analysis using a simple numerical example. 14. The learning rate and the momentum constant were speciÿed as 0. We used the same parameters as in the above simulation by the standard back-propagation algorithm. Next we trained the neural network using the cost function in (31). Nii / Fuzzy Sets and Systems 115 (2000) 121–140 131 Fig. (35) where ÿPos is a pre-speciÿed threshold value (e. we depict the output from each output unit after 10 000 epochs of the learning. Possibility area of each class obtained by the learning for the possibility analysis. We deÿne RPos as follows: k RPos = {x |ok (x) ¿ ÿPos . Single-dimensional classiÿcation problem. In Fig. When the relation in (34) is satisÿed by the trained neural network. Fig. x ∈ k n }. ÿve hidden units and three output units by the standard back-propagation algorithm using the cost function in (30).

: : : . Using this rejection method.8] then x = (x1 . Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. 6. Op2 . : : : . This knowledge is represented as an interval vector ([0. 11. interval vectors as well as real vectors can be represented in a single framework as interval vectors. The corresponding interval output vector Op = (Op1 . First a three-layer feedforward neural network with two input units. 0. 2. : : : . we use the following cost function as a measure of Fig. ÿve hidden units and two output units was trained by the learning algorithm based on the cost function in (31) for 10 000 epochs.4] and x2 is in [0. Opc ) is calculated by interval arithmetic as in (7) – (9) of Section 3. m. Learning from interval inputs In this section.132 H. 0. each interval vector Xp = (Xp1 . 15. Ishibuchi.5. 11. 1] by the trained neural network in Fig. to which Class !1 is assigned. The classiÿcation result is shown in Fig. : : : . Examine the inclusion of each class in the set of possible classes in this manner. we classify the onedimensional pattern space [0. In the learning of the neural network.5. Xpn ) is presented to the neural network. 15. the new pattern x is classiÿed as that class. If the set of possible classes includes only a single class. : : : . The learning of the neural network is to drive the interval output vector Op to the target vector tp deÿned in (28). Otherwise the classiÿcation of x is rejected and the set of possible classes is presented to human users. Then the twodimensional pattern space [0. : : : . !(Xp )). From this ÿgure. we discuss the learning of neural networks from interval vectors. xn ) is performed by the neural network trained for the possibility analysis as follows: Rejection method 5: If ok (x)¿ÿPos . !2 . : : : . xpi ] for i = 1. !(Xp )). can see that each possibility area covers all the training patterns from the corresponding class. m are given for the learning of neural networks where !(Xp ) ∈ {!1 . nH hidden units. 1] was classiÿed using the above rejection method by the trained neural network. In the learning phase. . and c output units. x2 . 14.1. Classiÿcation result by the rejection method based on the possibility analysis. 0. The classiÿcation result is shown in Fig. Xp2 .8]). 0. p =1. let us assume that a human expert has the following knowledge “If x1 is in [0. Xp2 . n. x2 ) belongs to Class !1 ” for a two-dimensional classiÿcation problem. Classiÿcation result for the two-dimensional problem in Fig. p = 1. From this ÿgure. include Class !k in the set of possible classes. we use a three-layer feedforward neural network with n input units. a real vector xp = (xp1 . 16.1. The classiÿcation of a new pattern x = (x1 . xpn ) can be denoted as an interval vector Xp = (Xp1 . Interval vectors are obtained from experts’ knowledge. 2. !c } Our problem in this section is to train neural networks by the given interval vectors (Xp . For example. 1] × [0. [0. For example. : : : . M. xp2 .4]. we can see that the overlapping area was correctly identiÿed by our rejection method. Xpn ) where Xpi = [xpi . we can see that our rejection method could correctly identify the overlapping areas in the pattern space. 2. We also applied this rejection method to the classiÿcation problem in Fig. So we assume in this section that m labeled interval vectors (Xp . Since real numbers can be viewed as a special case of intervals whose lower and upper limits are the same. : : : . 16. Interval vectors are also used for denoting incomplete patterns with missing inputs as we have described in Section 3. As in the previous sections.

ÿve hidden units and two output units. Fig. 17 was obtained from the trained neural network by hard decision making.. In Fig. 1. 1000 epochs) with the learning rate 0:25 and the momentum constant 0:9. (37) L U where opk and opk are the lower limit and the upper limit of the interval output Opk from the kth L U output unit. From this ÿgure. Let us consider the situation where a human expert has the following knowledge: “If x1 is not smaller than 0:8. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 133 the di erence between Op and tp . we can see that the class boundary.. Class boundary obtained by the learning from the given training patterns and additional knowledge. This is because the inside of each interval vector was not used in the learning of the neural network in Fig. Since the domain of each input of the classiÿcation problem in Fig. 17 was trained from these interval vectors using the cost function in (37). the class boundary violates one interval vector in Fig. We used the same neural network as in the above example. 20.0]) in our approach. 19 using the cost function in (37). So we have 21 labeled interval vectors. respectively (i. A back-propogation type algorithm can be derived from this cost function [18]. The learning was iterated 1000 times (i. While the trained neural network can correctly classify the four vertices of each interval vector. we show the class boundary generated by the trained neural network. 17 are given for a classiÿcation problem with the two-dimensional pattern space [0. The neural network was trained from the six interval vectors in Fig. 18. neural network by the standard back-propagation algorithm using the four vertices of each interval vector.0. 20.8. we can see that all the six interval vectors are correctly classiÿed. First we applied the standard back-propagation algorithm to a three-layer feedforward neural network with two input units. 1. We assume that 20 labeled patterns in Fig. In Fig. this knowledge is denoted as an interval vector ([0. We show another example to illustrate the learning from interval vectors. From this ÿgure. x = (x1 . x = (x1 . 1]. c ep = k=1 L (opk −tpk )2 =2 + c k=1 U (opk −tpk )2 =2. Ishibuchi. follows the expert knowledge: “If x1 is not smaller than 0. which are denoted by closed and open circles in Fig. The labeled patterns in Fig. 17 is the unit interval [0. 19 are given for the learning of neural networks. 1]. 17 are also denoted as interval vectors. the entire . 1] × [0.0]. Let us assume that six interval vectors in Fig. which correctly classiÿes all the given labeled patterns.8. The same neural network used in Fig. 20. [0. 17. x2 ) belongs to Class 2”. x2 ) belongs to Class 2”.e. The class boundary in Fig. The class boundary generated by the trained neural network is shown in Fig. we trained the same Fig. Training patterns and the class boundary obtained by the standard back-propagation algorithm. For comparison. we show the class boundary obtained from the trained neural network. M. On the contrary. 19. 20. Let us illustrate the learning of neural networks from interval vectors using a simple example. Opk = [opk . opk ]).e.H. 18.

x2 ) belongs to Class !1 . The membership function of each linguistic value is given in Fig. Fuzzy vectors are also used for representing uncertain information as we have described in Section 4. tp2 . ! (Xp )). Xp2 .14]. : : : . The corresponding fuzzy output vector Op = (Op1 .. 0:50. In the learning of the neural network.134 H. A back-propagation type learning algorithm can be derived from this cost function [13. tpc ). x2 ) belongs to Class !2 . Ishibuchi. If x1 is medium or medium large or large and x2 is small then x = (x1 . Let us illustrate the learning of neural networks from fuzzy vectors using the classiÿcation problem in Fig. Op2 . Xp = (Xp1 . 9 of Section 4. We use a threelayer feedforward neural network with n input units. !2 . Xpn ) is presented to the neural network. : : : . p = 1. 17 of the previous section.. !c }).. [Opk ]U ])..e. As the membership 7. area of each interval vector was used in the learning in Fig.e. 0:75) in the cost function in (38). Fuzzy vectors are obtained from human experts as linguistic . : : : . [Opk ] = [[Opk ]L . where [Opk ]L and [Opk ]U are the lower limit and the upper limit of the -cut [Opk ] of the fuzzy output Opk from the kth output unit. c ep = k=1 ([Opk ] − tpk )2 =2 c L + k=1 ([Opk ]U − tpk )2 =2 . !(Xp ) ∈ {!1 .e. (38) Fig. These fuzzy vectors can be used in the learning of neural networks in a similar manner to the case of interval vectors in the previous section. 20. If x1 is large then x = (x1 . Xpn )) and !(Xp ) is the class label of Xp (i. In the learning phase. 19. = 0:25. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. the following cost function is used as a measure of the di erence between the fuzzy output vector Op and the non-fuzzy target vector tp = (tp1 . and c output units. Class boundary obtained by the learning form interval vectors. Opc ) is calculated by fuzzy arithmetic in (17)–(19) of Section 4. knowledge. Class boundary obtained by the learning from four vertexes of each interval vector. m are given where Xp is an n-dimensional fuzzy vector (i. nH hidden units. Xp2 . we used three discrete values of (i. each fuzzy vector Xp = (Xp1 .e. 2. M. x2 ) belongs to Class !2 . We assume that m labeled fuzzy vectors (Xp . Learning from fuzzy inputs In this section. : : : . We assume that the following linguistic rules are given for this problem: If x1 is small then x = (x1 . : : : . In computer simulations of this paper. 19. respectively (i. we discuss the learning of neural networks from fuzzy input vectors. : : : .

In Fig. Training patterns and the class boundary obtained by the standard back-propagation algorithm. 22. which correctly classify all the given patterns. (medium or medium large or large. The class boundary of this ÿgure was generated after the learning of a three-layer feedforward neural network with two input units. follows the given three linguistic rules. 17 is the unit interval [0. 22. ÿve hidden units and three output units.H. 23.. we can see that the overÿtting to the training patterns was avoided by the use of the fuzziÿed training patterns. we used the same neural network as in Figs. Each pattern xp = (xp1 . The learning of the neural network was performed after fuzzifying all the training patterns in Fig. we generate three fuzzy vectors from the above three linguistic rules as (small. 22 with training = 0:1. The neural network was trained by the learning algorithm derived from the cost function in (38). The e ect of the fuzziÿcation of training patterns on the overÿtting of neural networks is also discussed in the next section using real-world classiÿcation problems. From this ÿgure. The standard back-propagation algorithm was iterated 5000 times (i. 21. 23. We also trained the same neural network using fuzziÿed training patterns. 24. [0. xp2 ) was fuzziÿed into a triangular fuzzy vector Xp = (Xp1 . small). Xp2 ) by adding a spread training to xpi as shown in Fig. 17 and the three fuzzy vectors are handled in a single framework as fuzzy vectors in the learning of the neural network.e. 22. 24. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 135 Fig. M. 1]. 1]). 17 and 18. we can see that the class boundary. (large. Fuzziÿcation of some training patterns is also illustrated in Fig. Fuzziÿcation of a real number xpi into a triangular fuzzy number Xpi by adding a spread training . Fig. Since the domain of each input of the classiÿcation problem in Fig. Ishibuchi. In Fig. the given patterns in Fig. 5000 epochs) using the learning rate 0.9. Since real numbers and intervals can be viewed as special cases of fuzzy numbers. we can observe Fig. . function of “medium or medium large or large”. The class boundary generated by the training neural network is shown in Fig. From this ÿgure. 1]). Next we show that the fuzziÿcation of input vectors is e ective for avoiding the overÿtting of neural networks to training data. the overÿtting of the neural network to a training pattern from Class 3 at the center of the two-dimensional pattern space. [0. Class boundary obtained by the learning from the given training patterns and three linguistic rules. Let us consider a classiÿcation problem in Fig. In the learning. we show the class boundary generated by the trained neural network. “medium large” and “large”. we use a trapezoidal membership function that includes “medium”. 21.25 and the momentum constant 0.

In the leaving-one-out procedure for the iris data. Both data sets are available from UC Irvine Database (available via anonymous ftp from ftp.1 and 0. Simulation results First. we compared three types of inputs to neural networks: real numbers. which has been often used in the literature since Fisher’s study [8].9.uci. Performance evaluation 8. Test problems In previous sections. respectively. The 10-fold cross-validation consists of 10 iterations of the training-and-testing procedure so that each of the ten subsets was used as test data just once. is a three-class classiÿcation problem with four attributes. 1] in our computer simulations before they were used in the learning of neural networks. The number of hidden units was tuned in each simulation. This procedure was iterated 150 times so that each of the 150 patterns was used as test data just once. we report simulation results on real-world data sets for evaluating the performance of each method. Each attribute value in these data sets were normalized into the unit interval [0. we have described several methods for soft decision making.136 H. The iris data set. The Australian credit approval data set is a twoclass classiÿcation problem with 14 attributes. 8. which consists of 690 labeled patterns. Then nine subsets were used as training data and the other subset was used as test data. we show that interval inputs and fuzzy inputs worked well for avoiding the overÿtting of neural networks. The performance of various fuzzy and non-fuzzy classiÿcation methods was evaluated by Grabisch and Nicolas [10] and Weiss and Kulikowski [24] using the leaving-one-out procedure on the iris data.2. This data set.9. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. We calculated the average error rate on test data by iterating the 10-fold cross-validation 20 times using di erent partitions of the credit data set into ten subsets. 24.25 and 0. the learning rate and the momentum constant were speciÿed as 0. Class boundary obtained by the learning from fuzziÿed training patterns.edu in directory /pub/machine-learning-databases). M. In the learning of neural networks. intervals and fuzzy numbers.e. the learning rate and the momentum constant were speciÿed as 0. We calculated the average error rate on test data by iterating the leavingone-out procedure ÿve times using di erent initial values of connection weights and biases of the neural network. The number of hidden units was tuned in each simulation. As in Quinlan [22]. we applied the 10-fold cross-validation to the credit data for evaluating the performance of soft decision making methods. We used three-layer feedforward neural networks with 14 input units and two output units for the credit data. We used three-layer feedforward neural networks with four input units and three output units for the iris data. Ishibuchi. In the learning of neural networks. In this section. In the 10-fold cross-validation for the credit data. We also used the leaving-one-out procedure for evaluating the performance of soft decision making methods. In our computer simulations on the iris data. Using the iris data and the credit data. 69 patterns).. 149 patterns were used for training a neural net- work and the other pattern was used as test data for evaluating the performance of the trained neural network. ÿrst the 690 labeled patterns were divided into ten subsets of the same size (i. respectively. This data set consists of 150 labeled patterns (50 patterns from each of three classes).ics. an interval input Xpi was generated from each input value xpi by adding .5 algorithm. was used in Quinlan [22] for evaluating his C4.1. 8. We used iris data and Australian credit approval data because they were often used for evaluating the performance of various classiÿcation methods in the literature.

That is. 0:10. We also examined ten threshold values ÿdi erence = 0:05. Simulation results are summarized in the rejection-error space in Fig. In our computer simulations on the iris data. : : : . We examined ten values of the radius: = 0:01. the di erence between the largest and the second largest output value.01 to xpi as Xpi = [xpi − 0:01. the interval data. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 137 Fig. we ÿrst trained the neural network with ÿve hidden units by the standard back-propagation algorithm using training data for 250 epochs. we did not use intervals or fuzzy numbers in the classiÿcation phase of test data. Next we show the comparison of three rejection methods described in Sections 2 and 3. we can see that interval inputs and fuzzy inputs worked well for avoiding the overÿtting of the neural network in the learning for the iris data.. We also performed the same comparison using the credit data. M. Average error rates on test data during the learning for credit data from three types of training data. 26. and interval inputs. each input xpi in test data was transformed into an interval input Xpi = [xpi − . Average error rates on test data are summarized in Fig. 0:55.30 were generated for the learning of the neural network with 15 hidden units. The performance of the neural network trained by each type of training data was evaluated by the average error rate on test data using the hard decision making.05 to xpi (i. respectively. xpi + ] where is the radius of Xpi . Average error rates on test data during the learning for iris data from three types of training data. 27. we examined ten threshold values ÿmax = 0:50. we only report simulation results by the three rejection methods. That is. 26. Since the rejection method based on fuzzy inputs is almost the same as the interval input method. In the rejection method based on the largest output value. Interval inputs with the radius of 0. A fuzzy input Xpi was generated form xpi by adding the spread of 0. 0:50 in the rejection method based on the di erence between the largest and the second largest output value. Ishibuchi.05 and fuzzy inputs with the spread of 0. Fig. This observation is much clearer in the application to the credit data as we show in the following. 25.e. 0:02. training = 0:05 in Fig. : : : . 23 of Section 7). 0:95. 25. In the rejection method based on interval inputs.H. and the fuzzy data. Those rejection methods are based on the largest output value. Simulation results are summarized in Fig. : : : . we can observe the deterioration of the performance of the neural network by too many iterations of the learning in the case of real number training data. The same neural network with 10 hidden units was trained by the original training data. The evaluation of each rejection method was performed by iterating the leaving-one-out procedure ÿve times. Such deterioration is not severe in the cases of interval training data and fuzzy training data. the radius of 0. From this ÿgure. Then each rejection method was used for classifying test data by the trained neural network. We can see from this ÿgure that interval inputs and fuzzy inputs worked well for avoiding the overÿtting of the neural network in the learning for the credit data. xpi + 0:01]. 0:10. From this .

xp4 ). Each test pattern was classiÿed by the trained neural network using the rejection method . xp3 . Next we examined the classiÿcation of test data with missing inputs by the rejection method based on interval inputs. we ÿrst trained the neural network with 15 hidden units by the learning algorithm for interval training data with = 0:1 for 50 epochs. xp2 . ?. xp2 . we can see that the rejection rate increases as the number of missing inputs increases. We also compared the three rejection methods by the 10-fold cross-validation on the credit data. Fig. we can see that each rejection method could decrease the error rate on test data. xp4 ). ?. ?) where “?” denotes a missing input. : : : . xp3 . This feature was theoretically shown using the inclusion monotonicity of interval arithmetic in Section 3. Rejection-error tradeo of each rejection method for credit data. which were classiÿed by the trained neural network using the rejection method based on interval inputs. ?. (?. 28. xp4 ). xp3 . In our computer simulations on the credit data. xp3 . xp4 ). (xp1 . From each test pattern xp = (xp1 . We examined the classiÿcation performance on test data by applying the leaving-one-out procedure to the iris data ÿve times. ?. : : : . ?. We can also see that some test patterns could be correctly classiÿed even if they involved missing inputs. ?. It should be noted that the error rate never increases as the number of missing inputs increases. we can see that each rejection method could decrease the error rate on test data. ÿgure. Simulation results are summarized in Fig. In our computer simulation. Finally we examined the classiÿcation performance of neural networks that were trained by the iris data for the possibility analysis described in Section 5. From this ÿgure.138 H. Classiÿcation results on test data with missing inputs (iris data). From this ÿgure. : : : . Then each rejection method was used for classifying test data by the trained neural network. Average classiÿcation results on these test patterns with missing inputs are summarized in Fig. (?. (xp1 . Each rejection method was examined using various threshold values as in the above computer simulations. 28. 29. Ishibuchi. Then we generated test patterns with missing inputs. 29. Fig. M. 27. we ÿrst trained the neural network with ÿve hidden units by the standard back-propagation algorithm using training data for 250 epochs. Average classiÿcation rates on test data with missing inputs were evaluated by applying the leaving-one-out procedure to the iris data ÿve times. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 Fig. Rejection-error tradeo of each rejection method for iris data. xp4 ). we generated 14 patterns with missing inputs: (?.

an input pattern is not always classiÿed as a single class. pp. Eugenics 7 (1936) 179 –188. By computer simulations on commonly used data sets. Learning by fuzziÿed neural networks. Introduction to Interval Computations. From this ÿgure. H. 1st IEEE Internat. IEEE Trans. we can see that the error rate decreased by the learning for the possibility analysis while the rejection rate increased. 1983. R. Fuzzy classiÿcation with reject options by fuzzy if–then rules. Chow. 9. Tanaka. Reasoning 13 (1995) 327–358. Prague. and the rejection rate was 27. [13] H. Pattern Anal. IEEE Internat. Alefeld. C. Tortorella. Proc. Proc. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 139 ing inputs in the classiÿcation phase. [3] Y. Systems Man Cybernet. Grabisch.H. [14] H. Fujioka. In the context of soft decision making of this paper. 1452–1457. Conf.B. Some methods are capable of suggesting a set of possible classes when the classiÿcation of the new pattern is rejected. Ha. Machine Intell.A. F. on Fuzzy Systems. Morioka. Ishibuchi. Turksen. 1997. 111–116. [16] H. Soft nearest neighbor classiÿcation.P. Ann. [10] M. Buckley.P. 7th IFSA World Congress III. pp. IEEE Trans. Fuzzy Sets and Systems 48 (1992) 331–340. As we showed by computer simulations. pp. Learning rejection thresholds for a class of fuzzy classiÿers for possibilistic clustered noisy data. on Fuzzy Systems. Nakashima. H. The tradeo between error rates and rejection rates was examined by computer simulations in this paper. References [1] G. on Neural Networks. Ishibuchi. Tanaka. Frelicot. Class !2 and Class !3 were suggested by the trained neural network as possible classes for all the rejected patterns. Vento. Fuzzy Sets and Systems 65 (1994) 255 –271. Ishibuchi. IEEE Trans. After 10 000 epochs. Ishibuchi. J. Baram. Archer. Possibility and necessity pattern classiÿcation using neural networks. Anchorage. 1293–1300. Nicolas. Tanaka. R. IRE Trans. Theory 16 (1970) 41– 46. The use of multiple measurements in taxonomic problems.-M. [4] J.33%. K. J. Some methods are also capable of handling uncertain data as interval inputs and fuzzy inputs in the learning phase as well as in the classiÿcation phase.K. Proc. Fig. soft decision making methods could decrease error rates while they increased rejection rates at the same time. Fujioka. [12] H. Inform.86%. Classiÿcation by fuzzy integral: performance and tests. 6 (1957) 247–254. [8] R. Conf. Approx. Hayashi. 30.K. An architecture of neural networks for input vector of fuzzy numbers. 21 (1991) 735 –742. Wang. Ishibuchi. Neural networks that learn from fuzzy if–then rules. An optimum recognition error and reject tradeo . San Diego. Average classiÿcation results are summarized in Fig. Cordella. T. Internat. [2] N. [15] H. Average error and rejection rates on test data during the learning for the possibility analysis (iris data). 7th IEEE Internat. J. H. Conf. for the possibility analysis in Section 5 using the threshold value ÿPos = 0:5. All the test patterns from Class !1 were correctly classiÿed by the trained neural network after 10 000 epochs. S. The optimum class-selective rejection rule. M. De Stefano. Chow. 1997. Comput. [11] T. Fisher. R. Fuzzy set representation of neural network classiÿcation boundary. [5] C. Electron. Fuzzy Systems 1 (1993) 85 –97. Herzberger. each of which was actually from Class !2 or Class !3 . It was also shown that interval inputs could be employed for handling miss- . A method for improving classiÿcation reliability of multilayer perceptrons. Fuzzy Sets and Systems 66 (1994) 1–13. 1998. [9] C. 30.H. IEEE Trans. Ishibuchi. An optimum character recognition system using decision function. we showed that the use of fuzzy inputs and interval inputs in the learning phase was e ective for avoiding the overÿtting of neural networks. I. 1469 –1473. M. Y. IEEE Trans. 1992. the error rate was 1.J. Fuzzy neural networks: a survey. New York. Neural Networks 6 (1995) 1140 –1147. Proc. Houston. pp. 19 (1993) 608– 615. [6] C. Conclusion In this paper. Academic Press. we described various soft decision making methods using neural networks. [7] L. Fujioka.

140 H. Joint Conf. Morgan Kaufmann. 1986. pp. Neural Networks 8 (1997) 679 – 693. Netherlands. Quantum neural networks (QNN’s): inherently fuzzy feedforward neural networks. .A. J. Fuzzy Logic: State of the Art. IEEE Trans. Kulikowski. Roubens (Eds. IEEE Internat. Ishibuchi. Rumelhart. Morgan Kaufmann. M. Quinlan. Ishibuchi.E. pp. Approximate pattern classiÿcation using neural networks. [17] H. Kluwer Academic. MIT Press. New York. Possibilistic fuzzy classiÿcation using neural networks. San Mateo. Nii. [24] S. C4. Ishibuchi. M. Kaufmann. San Mateo. Weiss.5: Programs For Machine Learning. Proc. C. 1433–1438.B. 1993.M. An extension of the BP-algorithm to interval input vectors. 1588–1593. 225 –236. Karayiannis. on Neural Networks. H. 1993. [20] A. [19] H.R. McClelland. Lowen.L. in: R. 1991. Cambridge. Dordrecht. Parallel Distributed Processing. Singapore. Introduction to Fuzzy Arithmetic. IEEE Internat. Van Nostrand Reinhold. [22] J. [23] D. Computer Systems That Learn. Proc. Tanaka. Purushothaman. Tanaka.M.). M. on Neural Networks. [18] H. 1985. Gupta. pp. Ishibuchi. H. Nii / Fuzzy Sets and Systems 115 (2000) 121–140 [21] G. Conf. Houston. and the PDP Research Group. 1997. N. 1991. M.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd