You are on page 1of 4

Proceedings of 1993 International Joint Conference on Neural Networks

Learning from Incomplete Training Data with Missing Values


and Medical Application

Hisao ISHIBUCHI, Akihiro MIYAZAKI, Kitaek KWON and Hideo TANAKA


Department of Industrial Engineering, University of Osaka Prefecture
Gakuencho 1-1, Sakai, Osaka 593, JAPAN

Abstract- A neural-network-basedclassifica- sification system is compared with a rule-based fuzzy


tion system is constructed to handle incom- classification system in [5].
plete data with missing attribute values, and
applied to a medical diagnosis. In our ap- 11. Incomplete Data with Missing Values
proach, unknown values are represented by
intervals. Therefore incomplete data with Let us assume that m samples are given from c
missing attribute values are transformed into classes. It is also assumed that each sample has n
interval data. A learning algorithm for multi- attributes. In conventional classification problems, n
class classification problems of interval input attribute values of each sample are completely known
vectors is derived. The proposed approach and given by a real vector. In classification problems
is applied to the medical diagnosis of hep- discussed in this paper, some of n attribute values of
atic diseases, and its performance is compared each sample are known, but the others are unknown
with that of a rule-based fuzzy classification (Le., missing). Unknown attributes are not always
system. the same over the given m samples (i.e., it is possible
that the i-th attribute value of some sample is known
while that of another sample is unknown). Let us
I. Introduction denote the n attribute values of the p t h sample by
an n-dimensional vector z, = (xpl,2,2, . . . , xpn).. It
In many application tasks of neural networks, the
ability to deal with missing or uncertain input val- should be noted that some elements of x, are miss-
ues is crucial[l]. A few attempts have been proposed ing. For example, x p = (0.5,0.2, ?, 0.7) where the
to deal with such inputs in neural networks. Ah- value of the third attribute is missing.
mad and Tresp[l] discussed Bayesian techniques for For simplicity, let us assume that the pattern space
extracting class probabilities from incomplete data is the n-dimensional unit cube [0, 11,. This means
with missing or noisy inputs. Ishibuchi et a1.[2,3] ex- that the set of possible values of each attribute is
tended the back-propagation algorithm[4] to the case the unit interval [0,1]. In this case, the true value
of interval input vectors and fuzzy input vectors. of an unknown attribute is known to be in the unit
interval. Therefore we can represent each missing
This paper proposes an approach to construct a value by [0, 11. Since real numbers can be viewed as a
neural-network-based classification system that can special case of intervals (i.e., degenerated intervals),
handle incomplete data with missing attribute val-
ues. In our approach, unknown attribute values are known attribute values can be also represented by
represented by intervals. For example, if the pat- intervals.
tern space of a particular classification problem is the In this manner, the n attribute values of the p t h
n-dimensional unit cube [0, l]", each unknown at- sample are represented by an n-dimensional interval
tribute value is represented by the unit interval [0,1] vector X, = (X,l, Xpa,. . . , X,,) where
that includes all the possible values of that attribute.
[xpi,zpi] if xpi is known,
Since real numbers can be viewed as a special case of
intervals (i.e., degenerated intervals), known inputs
are also represented by intervals (e.g., 0.3 is repre-
xpi =
{ [0, 13 if xpi is unknown. (1)

That is, the incomplete data z,, p = 1 , 2 , . . . , m,


sented by [0.3,0.3]). In this manner, incomplete data
with missing attribute values are transformed into with missing attribute values are transformed into
complete interval data. the complete interval data X,,p = 1 , 2 , . . . , m.
In this paper, first an architecture of multilayer Therefore our problem is reduced to the multi-class
feedforward neural networks is defined for interval classification of the interval vectors X p ' s .
input vectors. Next a learning algorithm for multi-
class classification problems of interval input vectors 111. Learning of Neural Networks
is derived by extending the back-propagation algo-
rithm[4]. Last the proposed approach is applied to A. Neural Network Architecture
the medical diagnosis of hepatic diseases in Tanaka In this paper, we denote real numbers and intervals
et a/.[5]. The constructed neural-network-based clas- by lowercase letters and uppercase letters, respec-

1871
1.0 I I
tively. An interval is represented by its lower limit
and upper limit as A = [aL,U’] where uL and U’
are the lower limit and upper limit of the interval
A, respectively. Therefore the interval vector X , is
represented as
X p (Xpl,...,Xpn)
L
= ([z,”,,zpUlIi-..,[zpn,zPU,I). (2)
Let us define the input-output relation of a three-
layer feedforward neural network with n input units,
n H hidden units and c output units when the inter-
val input vector X, is presented. By extending the
standard back-propagation network to the case of the
interval input vector X , , the input-output relation
of each unit is defined by interval arithmetic[6] as Fig.1. Activation function f(.)
follows:
Input unit:
o,, = [o,”,, 0 3 = x,i = [z”.
pa1 ‘
.pi],
O ( X ) = ( 0 1 ( X ) ., . . , O , ( X ) ) . Because O ( X )is cal-
culated by (3)-(9) based on interval arithmetic (i.e.,
i = 1,2,...,n, (3) the input-output relation of each unit satisfies the
inclusion monotonicity of intervals[6]), the following
proposition holds.
[Proposition 11
For any pair of n-dimensional interval vectors
X and Y such that X E Y (i.e., Xi 2
n n E, i = 1 , 2 , . . . , n), the following inclusion relation
holds.
o k ( x ) Ok(Y), k = 1 , 2 , ’ , c.
” (10)

This proposition means that the inclusion relation


between interval input vectors is preserved to interval
output vectors. From this proposition, we have the
following proposition.
[Proposition 21
Let us denote the true attribute values of the p-
th sample by an n-dimensional real vector iiip (iiip E
X , ) . Then the real output vector o(2,) correspond-
ing to i , is included in the interval output vector
O(X,) corresponding to X , , i . e . ,

ok(kp) E o k ( x p ) , k = 1,2, * .* , C, (11)


where O k ( & p ) is the real output from the k-th output
unit when the real input vector 9, is presented to
the neural network defined by (3)-(9).
j=l j=1
w k 3 2 0 Wkj<o
This proposition means that the interval output
vector calculated from incomplete attribute values
where w j i and w k j are weights, 0j and 8 k are bi- (i.e., from the interval input vector X , ) always in-
ases, and f ( x ) = 1/{1 +exp(-z)}. The extension cludes the real output vector calculated form true
of the activation function f(.) to the case of an in- attribute values (i.e., from the real vector G , ) .
terval input, N e t , is illustrated in Fig.1. It should
be noted that the input-output relation in (3)-(9) is B. Learnzng Algorathm
the same as the standard back-propagation network
when a real input vector is presented. This is clear Let us define the target vector t, = ($,I, . . ,t,.) cor- 1

from the fact that o$ = o$, k = 1,2, . , e, when xki


+
responding to the interval input vector X, as
-
- ,z
; i = 1 , 2 , . . . , n.
1 if the sample p is Class c,
Let us denote the interval output vector cor- tpk = (12)
responding to the interval input vector X by 0 otherwise.

1872
We define a cost function for the p t h training pat- IV. Application to Medical Diagnosis
tern ( X , , t , ) as A. Data of Hepatac Daseases
C C
We applied the proposed approach to the medical di-
agnosis of hepatic diseases in Tanaka et aZ.[5]. The
k=l k=l classification problem of hepatic diseases has five
classes (Class 1: Healthy person, Class 2: Hepatoma,
where Opk= [o$, OFk] is the interval output from Class 3: Acute hepatitis, Class 4: Chronic hepati-
the k-th output unit calculated by (3)-(9). tis, Class 5: Liver cirrhosis), 568 persons and 20 at-
In the same manner as the back-propagation algo- tributes (i.e., 20 medical inspections of each person).
rithm[4], the weights W k j and wjj are changed by the We applied the proposed method to these data on the
following rules: same conditions as Tanaka et al.[5]. That is, we used
AWkj(t + 1 ) = q(-dep/dWkj) + a A w k j ( t ) , (14)
the following conditions in computer simulations:
( 1 ) The given data were divided into two parts:
A w j i ( t + 1 ) = q(-de,/dwji) + a A w j i ( t ) ,
(15) Data of 468 persons were used for training neural
networks, and those of 100 persons were used for
where the derivatives de,/dwkj and de,/dwji are testing the trained neural networks.
calculated as follows: ( 2 ) Seven attributes in Table 1, which had been
selected in [5], were used in the learning of neural
networks. Each attribute value was pre-processed
and transformed into one of the pre-specified discrete
values in the unit interval [0,1] as shown in Table
1. For example, if the attribute value of SP is in
the interval [5.6,6.5], it is converted into 0.25. The
number of missing values are also shown in Table 1.
From Table 1, we can see that a number of attribute
values are missing.
B. Classificataon Rules
In Section 3, we have already proposed a learning al-
gorithm of a three-layer feedforward neural network.
Because the output vector from the trained neural
network is an interval vector, we should specify clas-
sification rules based on the interval output vector.
In computer simulations, we employed the following
heuristic classification procedure.
[Classification of the input vector X,]
Step 1: Calculate the interval output vector 0,
from the trained neural network corresponding
to the interval input vector X , .
Step 2: Calculate the index set Q as follows:
C
4 = {kloFk 2 0.5, k = 1 , 2 , ” ‘ , c } . (22)
Step 3: ( 1 ) If 4 has a single element, say k*, then
classify X, as Class k*.
(2) If 9 has more than one element, then
reject the classification of X,.
e If wji < 0, (3) If \E has no element, then classify X,
C as Class h such that
orh = m a x { o ~ k l l c = 1 , 2 , . . . , c }. (23)
C. Simulation Results
We trained a neural network with 7 input units, 30
hidden units and 5 output units. In computer sim-
C
ulations, initial values of the weights and the biases
were randomly specified in [-1,1], and the learning
constant 11 and the momentum constant a were spec-
ified as q= 0.075 and a = 0.9. Ten trials were per-
formed with different initial weights and biases. The
following average performance of the trained neural

1873
network after 800 epochs was obtained. V. Conclusion
Classification rate: 62.l[%] We proposed a neural-network-based approach to the
Error rate: 15.2[%] multi-class classification problem of incomplete data
Reject rate: 22.7[%] with missing values. In the proposed approach, miss-
ing attribute values were represented by intervals.
From the comparison of these results with the fol- Therefore our problem was reduced to the multi-class
lowing results by the rule-based fuzzy classification classification problem of interval vectors. We pro-
system in Tanaka et a1.[5], we can see that the error posed an architecture and a learning algorithm of
rate was significantly decreased from 23% to 15.2% neural networks for that problem. The proposed ap-
by the proposed method. proach was applyed to the medical diagnosis of hep-
Classification rate: 62[%] atic disease in Tanaka et aZ.[5], and its performance
E h o r rate: 23[%] was compared with that of the rule-based fuzzy clas-
Reject rate: 15[%] sification system.
In order to examine the relation between the per-
formance of the trained neural network and the num-
ber of iterations of the learning algorithm, we mon- References
itored the average error rate in the learning of the [l] S.Ahmad and V.Tresp: “Classification with Miss-
neural network every 40 epochs. The simulation re- ing and Uncertain Inputs”, Proc.IEEE-ICNN’93,
sults are summarized in Fig.2. From this figure, we pp.1949-1954 (1993).
can see that the performance for the test data was
almost the same after 200 epochs whereas that for [2] H.Ishibuchi and H.Tanaka: “An Extension of The
the training data was gradually improved. BP-Algorithm to Interval Input Vectors”, Proc. of
IJCNN’91-Singapore, pp.1594-1599 (1991).

Training data [3] H.Ishibuchi, H.Okada and H.Tanaka: “An Ar-


chitecture of Neural Network for Input Vectors
of Fuzzy Numbers”, Proc. of FUZZ-IEEE’92,
pp.1293-1300 (1992).
[4] D.E.Rumelhart, J.L.McClelland and the PDP
Research Group: Parallel Distributed Processing
(VoLl), MIT Press, Cambridge (1986).
[5] H.Tanaka, H .Ishibuchi and N.Matsuda: “Fuzzy
Expert System Based on Rough Sets and Its Ap-
plication to Medical Diagnosis”, Int. J . General
2 Systems, V01.21, pp.83-97 (1992).
0 500 1000 1500 2000
Number of Iterations (Epochs) [6] G.Alefeld and J .Herzberger: Introduction to In-
terval Computations, Academic Press. New York
Fig.2. Average error rate (1983).

Actual attribute values Number of missing values


Attribute
Input values to neural networks after pre-processing Training data Test data
0-5.5 5.6-6.5 6.6-7.5 7.6-8.5 8.6-00
SP 2 4
0.00 0.25 0.50 0.75 1.oo
0-4 5-6 7-9 10-00

-
I1 0 2
0- 0.00
- 0.33
-
0.67
100 101 150 151 200 201 250 251 500 501 00 -
1.00
-
-
ChE 3 3

GPR -0.00
0 25 26
0.20
- 0.40
- 0.60
-
100 101 200 201 500 501 1000 1001 00
0.80
-1.00
0 1

Lympho
0.00 0.20
-
0.40
-
0.60
0 N 20.0 20.1 40.0 40.1 60.0 60.1 00 -0.80 1.00
6 3

Al-%
-
0.00
-
0.33
-
0 2.5 2.6 3.7 3.8 5.0 5.1 CO
0.67
- 1.00
27 5

AFP
-
0 20 21
0.00
-
0.00

0.25
-
0.33
100 101 200 201
0.50
-
0.67

0.75
1.00
-
1000 1001 cm
1.00
95 45

1874

You might also like