You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/309427813

A New Pruning Technique for the Fuzzy ARTMAP Neural Network and Its
Application to Medical Decision Support

Conference Paper · January 2009

CITATIONS READS

0 20

3 authors:

Shahrul Nizam Yaakob Lakhmi C. Jain


Universiti Malaysia Perlis
906 PUBLICATIONS   9,693 CITATIONS   
69 PUBLICATIONS   476 CITATIONS   
SEE PROFILE
SEE PROFILE

Chiew Ping Lim


New Era College
17 PUBLICATIONS   63 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Super Resolution Technique for digital Image View project

Neuro-fuzzy modeling for Web Personalization View project

All content following this page was uploaded by Shahrul Nizam Yaakob on 26 October 2016.

The user has requested enhancement of the downloaded file.


A New Pruning Technique for the Fuzzy
ARTMAP Neural Network and Its Application
to Medical Decision Support

Shahrul N.Y, Lakhmi Jain, C.P. Lim*

Knowledge-Based Intelligent Engineering Systems (KES) Centre,


School of Electrical and Information Engineering, University of South Australia, Australia
Yaasn001@postgrads.unisa.edu.au, Lakhmi.Jain@unisa.edu.au, cp.lim@unisa.edu.au
* School of Electrical and Electronic Engineering, University of Science Malaysia, Malaysia

Abstract This paper describes a neural network-based classification tool that can
be deployed for data-based decision support tasks. In particular, the Fuzzy
ARTMAP (FAM) network is investigated, and a new pruning technique is pro-
posed. The pruning technique is implemented successively to eliminate those
rarely activated nodes in the category layer of FAM. Three data sets with differ-
ent characteristics are used to analyze its effectiveness. In addition, a benchmark
medical problem is used to evaluate its applicability as a decision support tool for
medical diagnosis. From the experiment, the pruning technique is able to improve
classification performances, as compared with those of to the original FAM net-
work, as well as other machine learning methods. More importantly, the pruning
technique yields more stable performances with fewer nodes, and results in a more
parsimonious FAM network for undertaking data classification and decision sup-
port tasks.

Keywords: Fuzzy ARTMAP, pruning, classification, decision support, medical di-


agnosis
2

1. Introduction

Advances in computer technologies have opened up the possibility of everyone in


the use of computerized decision support systems (DSSs) in various activities.
Traditionally, DSSs are mainly used for supporting decision making in manage-
ment problems. Currently, they are widely used as a strategic tool by many or-
ganizations in different domains.
A DSS normally consists of several building components that are integrated to-
gether. The model or solver component [7] is one of the important modules of a
DSS. In this component, the decision is generated by the DSS based on the
knowledge and information gathered a problem domain. There are various tech-
niques that can be used to develop this component. This paper focuses on the use
of neural network to design and develop a data-based classification and decision
making model for DSSs. There are many types of neural network that have been
used in many applications especially for recognition and classification purposes.
One of them is the Fuzzy ARTMAP (FAM) [5] network. FAM has several salient
properties as a learning system as compared with other types of neural networks.
One of the main advantages of FAM is the ability to construct its network struc-
ture automatically with both online and offline learning capabilities. FAM has
been known to produce a good classification performance with simpler training
steps in many applications. For example, Wee et al. [12] used FAM to classify
rice grain images. They demonstrated that the performance of FAM, as compared
with that of the Multi-Layer Perceptron (MLP) network, with back propagation
learning, can yield a high classification rate.
Besides, FAM has been identified as an incremental learning model. This is
because FAM has the ability to learn how to learn incoming patterns on a one-by-
one basis as the patterns become available for learning [8]. This feature makes
FAM suitable for performing data classification tasks in on-line learning environ-
ments. During on-line learning, new input patterns can be learned by the FAM
network without re-training using old and new data patterns [13]. In addition,
FAM is a transparent learning model since it is easy to explain why an input pat-
tern x produces a particular output y [2]. Here, transparency refers to the ability of
the designer to explain the reason why the network responds to a specific input
pattern in a particular way. Despite the advantages, there are issues that must be
considered during the development and implementation of FAM as a pattern clas-
sification tool. One of these is that FAM is a growing network, and its keep add-
ing nodes into its structure incrementally. As such, the network complexity in-
creases with time, as more and more nodes are created. Several researchers have
proposed methods to remedy this problem. One approach is by allowing some
training error for FAM, e.g. Micro ARTMAP (µAM) [9] and Bayesian ARTMAP
(BAM) [11]. This paper introduces a new pruning technique to reduce the number
of nodes in FAM, in an attempt to reduce the FAM network complexity.
3

The organizations of this paper are as follows: In Section 2, we discuss the dy-
namics of FAM and the problem of excess nodes in FAM. Section 3 explains the
data sets used, cross validation, and setting of FAM parameters, as well as the ex-
perimental results and discussion. Section 4 is about an analysis of the effective-
ness of FAM in medical diagnosis application. Section 5 gives the conclusions
and suggestions for further work.

2. Fuzzy ARTMAP (FAM)

FAM was introduced by Carpenter, G. et al. in 1992 [5]. FAM is a supervised


version of ART that consists of two blocks of unsupervised Fuzzy ART and a Map
Field. The map field links ARTa and ARTb together. Normally, ARTa receives the
stream of input patterns and ARTb receives the stream of the target outputs associ-
ated with the input patterns, and FAM is able to process both the analogue and bi-
nary input patterns.
Fig. 1 shows the ART architecture. It consists of two layers of nodes, i.e. rec-
ognition/category and input layers. The input layer comprises nodes correspond-
ing to the number of input dimensions with complement coding [5]. The recogni-
tion layer is where all the nodes or categories are created in an incremental
manner. These nodes are essential because they represent prototypes of the input
patterns, and are used to recall a predicted output during the test phase. In the F0
layer, the original input vector a goes through a normalization technique called
complement coding [5]. Weights Wjia represent the weight vector between the F2
(recognition/category) layer and the F1 layer, with j refers to the number of nodes
or categories in F2, and i is the number of nodes in F1, while ρa refers to the vigi-
lance parameter of ARTa.
Fig. 1 also illustrates the architecture of the map field, i.e., the connection be-
tween F2 of ARTa and F2 of ARTb. Weights Wjkab characterize the weight vectors
that connect F2 of ARTa with the map field, with k defines the number of catego-
ries in the map field. The number for categories in the map field and F2 of ARTb
are same. The output vector of the map field is Xab. During the training phase, an
input vector and its desired output vector are presented to ARTa and ARTb, respec-
tively. ARTa and ARTb modules classify the input and desired output vector into
categories. Then, the input to the map field module uses a map field vigilance pa-
rameter to determine whether the ARTa winning category is linked to correct tar-
get category in ARTb, and corrections are made accordingly.
The first step involved in the learning phase is the computation of the category
choice in the Fuzzy ART module. The Category Choice Function (CCF), Tj, for
each input I and F2 node j is defined using Equation (1), where the fuzzy AND
operator ^ is defined in Equation (2) and the norm |.| in Equation (3).
4

I^wj
Tj = (1)
α + wj

(I ^ w)i ≡ min (I i , wi ) (2)

M
I ≡ ∑I
i =1
i (3)

Recognition
F2 Layer
Map Field
F
ab

F1
Input Layer W jkab F a2 F b2

F0 Complement
Coding
ART ARTa

Input vector, a
Fig. 1 Fuzzy ART architecture (right) and Map Field architecture (left)

The next step is to find the maximum value of Tj, a competitive process, using
equation (4). Only one node in F2 with the highest value of Tj is selected. Reso-
nance is said to occur if the vigilance test (Equation (5) and (6)) is satisfied.
When the Jth category is chosen, yj = 1 and yj = 0 for j ≠ J. If Vj does not meet the
vigilance test, choice function Tj is set to zero for the duration of the input presen-
tation, and a new index j is chosen using Equation (4). All these operations occur
simultaneously in ARTa and ARTb.

T j = max{T j : j = 1.....N } (4)

Vj = I ^ wj / I (5)
5

Vj ≥ ρ (6)

The next stage is to find the value of X ab . If the J th of Fa2 node is active and
Kth of Fb2 is active, the value of Xab can be found using Equation (7). When both
ARTa and ARTb are active, and Xab ≠ 0, then the weight vector between Fa2 layer
and map field Fab is set according to Equation (8). If Xab = 0, then the vigilance
parameter of ARTa is increased such that the competitive process starts again, and
a new category that satisfies Eq. (5) is found. If there is no winner, a new node is
created and the value of input vectors is assigned as its weight. Learning ensues
by using Equation (9) whereby the weight vector wj of ARTa is updated to encode
the input pattern.

X ab = y b ^ w ab
j (7)

1 j = J;k = K
j =
w ab (8)
0 otherwise

( )
w (jnew) = β 1^ w (jold ) + (1 − β )w (jold ) (9)

In the test phase, Fuzzy ARTa receives an input vector. The category choice
and category match computations are the same as in the training phase. Therefore,
the output of map field Xab related to the Jth category of ARTa can be defined using
Equation (10). A link traced from the map filed to ARTb leads to the predicted tar-
get output.

ab ab
X =w (10)
j

2.1 The proposed Pruning Technique

In the pruning stage, every node j in the recognition layer is assigned with a pa-
rameter, Ωj. The pruning process only occurs during the training phase of FAM.
Each time when a new learning epoch begins, the value of Ωj is set to zero. Dur-
ing the learning epoch, Ωj changes to one when the associated node learns an input
pattern. This value remains as one irrespective of the number of times the node
becomes the winning node whereby its weight is updated. At the end of each
6

learning epoch, any node with Ωj equal to zero is pruned. These steps are repeated
for the next learning epoch until the learning phase is completed.

3. Experiments with Benchmark Problems

To analyze the effectiveness of the new pruning technique, three artificial data sets
are used. They are the Gaussian 2 Dimensional, Concentric, and Clouds data sets
from the ELENA project databases [10]. Fig. 4 and Fig. 5 are graphical
illustrations of Gaussian 2-D, Concentric, and Clouds, respectively. The Gaussian
2-D data set demonstrates a densely overlapped data distribution. The Concentric
data set has nested classes without overlapping. The Clouds data set shows
intersection of the class distribution, and has a high degree of nonlinearity in the
class boundaries. The details of the data sets are summarized in Table 1.

Table 1 Characteristics of ELENA dataset

Instance/Size Attributes Classes

Gaussian 2D 1000 2 2
Concentric 1000 2 2
Clouds 1000 2 2

In this work, 10-folds cross validation is used where each data set is divided
into 10 mutually exclusive subsets, designated as G1, G2, G3,….G10, of equal size.
Then, FAM is trained and tested 10 times. The Percentage of Correct Classifica-
tion (PCC) is defined as the number of correct classification divided by the num-
ber of data available in the data set, as in Eq. (11). Note that σ(x,y)t=1 for correct
prediction, otherwise σ(x,y)t=0, and n refers to the number of data sample tested.

1 10
PCC
k
= 100 ∑ NCC
k
(11)
G
k =1

n
NCC = ∑ σ ( x, y )
t
(12)
k
t =1

The Standard Deviation Value (STD) is also calculated to reflect the distribution
of FAM results. Other FAM parameters are set as in Table 2.
7

Table 2 FAM Parameter setting

Parameter Value

Baseline Vigilance 0.5


Vigilance of ARTb 1
Learning, β 1
Choice, α 0.0001
1.10

8
0.90

0.70
4

2
0.50

0
-8 -6 -4 -2 0 2 4 6 8

-2 0.30

-4
0.10

-6

-0.10 0.10 0.30 0.50 0.70 0.90 1.10


-8 -0.10
Class '0' Class '1'
Class '0' Class '1'

Fig. 4 The Gaussian 2D data (left) and Concentric data (right)

5.50

4.50

3.50

2.50

1.50

0.50

-3.20 -2.20 -1.20 -0.20


-0.50 0.80 1.80 2.80 3.80

-1.50

-2.50

-3.50

Class '0' Class '1'

Fig. 6 The illustration of Clouds data


8

3.1 Results and Discussion

The FAM classification results, both the original and pruned versions, are shown
in Table 3. The pruning technique is able to improve accuracy rate by 5.1% to
6.3%, as compared with those of original FAM. For example, the classification
rate of Concentric increased by 6.3% to 97.6%. More importantly, the pruning
technique is able to reduce 34 to 136 nodes, as compared with those of original
FAM. This shows the effectiveness of the proposed pruning technique in produc-
ing a compact FAM network structure with improved classification performance.

Table 3 Classification results

Nodes Classification Result, PCCb

Dataset FAM Average STDa Average STD


GUSSIAN 2-D Original 933.9 3.21 58.60 0.050
Pruned 899.4 1.27 63.70 0.042
CONCENTRIC Original 967.9 79.39 91.30 0.043
Pruned 831.8 42.67 97.60 0.011
CLOUDS Original 960.8 11.47 80.40 0.054
Pruned 894.1 4.12 86.20 0.025
a b
Standard Deviation Percentage Correct Classification

The lower STD values also suggest the stability of the performance of the
pruned FAM network. In other words, the pruning technique makes the original
FAM less sensitive to the order of data presentation. This will make FAM more
suitable in on-line learning environment [4]. Note that the results of Gaussian 2-D
are low. This is caused by the densely overlapped region in data distribution. As
illustrated in Fig. 4, there is no clear delineation between the boundaries of the
two Gaussian 2-D classes. Nevertheless, the pruning technique is able to increase
the classification performance with fewer numbers of nodes.

4. Application to Medical Diagnosis

In this work, the Wisconsin Breast Cancer (WBC) [3] data set is used to evaluate
the applicability of FAM with the proposed pruning technique as a medical deci-
sion support tool. The WBC data set contained 699 records of virtually assessed
nuclear features of fine needle aspirates from patients, with 458 benign and 241
9

malignant cases of breast cancer. The same training and test procedures as in Sec-
tion 3 were adopted to assess the FAM performance. Table 5 shows the classifica-
tion results of FAM (both original and pruned versions). It can be clearly seen
that while original FAM produced the lowest classification rate of 91.88%, the
pruned technique is able to improve its performance to 96.76%, which is the high-
est as compared with those of other machine learning methods (Table 5). The re-
sults demonstrate the effectiveness of the pruning technique in producing a parsi-
monious FAM network with improved classification performance.

Table 5 The classification of WBC data set

Method Classification Result (%)


C4.5 [8] 94.74
Optimized-LVQ [8] 96.70
Supervised Fuzzy Clustering [1] 95.57
Perceptron Decision Tree (FAT) [9] 96.45
Fuzzy ARTMAP (Original) 91.88
Fuzzy ARTMAP (Pruned) 96.76

5. Summary

In this paper, we have introduced a new pruning technique to reduce the number
of category nodes in the FAM network. The results obtained indicate that the
pruning technique can improve the classification performance result of FAM with
a more compact network structure. The results also are more stable (in terms of
STD) as compared with those from original FAM. Applicability of the proposed
approach as a decision support tool to medical diagnosis is also demonstrated us-
ing the WBC problem. The results, again, positively demonstrate that the pruned
FAM network, as compared with other machine learning methods, is able to pro-
duce high performance with a less complex network structure. The proposed prun-
ing technique can be incorporated with other FAM-based networks, e.g. Micro
ARTMAP and Bayesian ARTMAP. In addition, other pruning strategies can be
implemented and compared with the proposed technique. Besides, more bench-
mark and real data sets can be used to further ascertain the performance and stabil-
ity of the pruned network. All these constitute the direction of further work of this
research.
10

References

1. Abonyi, J. and F. Szeifert. Supervised fuzzy clustering for the identification of fuzzy classifi-
ers, Pattern Recognition Letter. Vol. 24, pp. 2195–2207.
2. Anagnostopoulos, G.C. et al. Reducing generalization error and category proliferation in ellip-
soid ARTMAP via tunable misclassification error tolerance: boosted ellipsoid ARTMAP. In-
ternational Joint Conference on Neural Networks, 2002, Vol.3, pp. 2650 – 2655.
3. Blake C.L. and Merz, C.J., UCI Repository of Machine Learning Databases,
http://www.ics.uci.edu/~mlearn/MLReposi-tory.html, University of California at Irvine. Cited
15 Mac 2008.
4. Bouchachia, Abdelhamid; Gabrys, Bogdan; Sahel, Zoheir, (2007) Overview of Some Incre-
mental Learning Algorithms, IEEE International Fuzzy Systems Conference, 2007, pp. 1 – 6.
5. Carpenter, G.A., Grossberg, S., Markuzon, N. and Reynolds, J.H.; Rosen, D.B. Fuzzy
ARTMAP: A neural network architecture for incremental supervised learning of analog mul-
tidimensional maps. IEEE Transactions on Neural Networks, Vol. 3, pp. 698 – 713.
6. Carpenter, G.A, Grossberg, S., Markuzon, N. and Reynolds, J.H.; Rosen, D.B. Fuzzy
ARTMAP: An adaptive resonance architecture for incremental learning of analog maps. In-
ternational Joint Conference on Neural Networks, 1992, Vol. 3, pp. 309 – 314.
7. Dong, C.S.J. and Loo, G.S.L. (2001). Flexible web-based decision support system generator
(FWDSSG) utilising software agents. 12th International Workshop on Database and Expert
Systems Applications, pp. 892 – 897.
8. Le, Q., Anagnostopoulos, G.C., Georgiopoulos, M. and Ports, K., An experimental compari-
son of semi-supervised ARTMAP architectures, GCS and GNG classifiers, IEEE Interna-
tional Joint Conference on Neural Networks, 2005. Vol.5. pp. 3121-3126,
9. Sanchez, E.G., Dimitriadis, Y.A., Cano-Izquierdo, J.M. and Lopez-Coronado, J. µARTMAP:
use of mutual inform-ation for category reduction in Fuzzy ARTMAP, IEEE Transactions on
Neural Networks, Vol.13, pp.58 – 69.
10. Verleysen, M., Bodt, E.D. and Wertz, V. UCL Neural Network Group,
http://www.dice.ucl.ac.be/neural-nets/Rese-arch/Projects/ELENA/elena.htm, Université ca-
tholique de L-ouvain. Cited 15 Mac 2008.
11. Vigdor B. and Lerner, B. The Bayesian ARTMAP, IEEE Transactions on Neural Networks,
Vol. 18, pp. 1628 – 1644.
12. Wee, C. Y., Paramesran, R., Takeda, F., Tsuzuki, T., Kadota, H., Shimanouchi, S. Classifica-
tion of rice grains using Fuzzy Artmap neural network, Asia-Pacific Conference on Circuits
and Systems, Vol. 2, pp. 223-226.
13. Zhong M., Rosander, B., Georgiopoulos, M., Anagno-stopoulos, G.C., Mollaghasemi, M.
and Richie, S., Experiments with Safe ARTMAP and Comparisons to Other ART Networks,
International Joint Conference on Neural Networks, 2006. pp. 720-727.

View publication stats

You might also like