You are on page 1of 5

2007 International Symposium on Information Technology Convergence

Classification of Symbolic Objects using Adaptive
Auto-Configuring RBF Neural Networks
T N Nagabhushan, Hanseok Ko, Junbum Park

S K Padma, Y S Nijagunarya

Department of Electronics & Computer Engineering
Korea University
Anam dong, Seongbuk-gu, Seoul 136-713, Korea
nagabhushan@ispl.korea.ac.kr, hsko@korea.ac.kr

Department of Information Science & Engineering
S J College of Engineering, Mysore 570 006, India.
skp@sjce.ac.in, nijagunarya@yahoo.com

Abstract— Symbolic data represents a general form of classical
data. There has been a highly focused research on the analysis
of symbolic data in recent years. Since most of the future
applications involve such general form of data, there is a need
to explore novel methods to analyze such data. In this paper
we present two simple novel approaches for the classification of
symbolic data.1
In the first step, we show the representation of symbolic
data in binary form and then use a simple hamming distance
measure to obtain the clusters from binarised symbolic data.
This gives the Class label and the number of samples in each
cluster. In the second part we pick a specific percentage of
significant data samples in each cluster and use them to train
the Adaptive Auto-configuring neural network. The training
automatically builds an optimal architecture for the shown
samples. Complete data has been used to test the generalization
property of the RBF network. We demonstrate the proposed
approach on the soybean bench mark data set and results are
discussed. It is found that the proposed neural network works
well for symbolic data opening further investigations for data
mining applications.
Key words: Auto-configuring neural networks, Incremental
learning, RBF, Significant patterns.

I. I NTRODUCTION
In Conventional data analysis the objects are numerical
vectors. Clustering of such numerical vectors is achieved by
minimizing the intra cluster dissimilarity and maximizing
the inter cluster dissimilarity. Many different approaches
have been devised to handle such type of data [1] [2].
Symbolic objects are extensions of classical data types. The
main distinction between these two forms of data is that in
case of classical data the objects are more ”individualized”
where as in symbolic frame work they are more ”unified”
by relationships. Symbolic objects are defined as the logical
conjunction of events linking values and variables.
For example e1 = [ Color = (white, blue)], e2 = [ height =
( 1.5-2.0)], here the variable e1 takes the color either white
or blue where as the variable height e2 has a value between
1 This research was supported by the Ministry of Information & Communication (MIC), Korea under the IT Foreign Specialist Programme (ITFSIP)
supervised by the Institute of Information Technology Advancement (IITA).

0-7695-3045-1/07 $25.00 © 2007 IEEE
DOI 10.1109/ISITC.2007.31

1.5 to 2.
In general, symbolic data have both quantitative ( numeric
or interval) as well as qualitative attributes. There exists
three types of symbolic data, namely Assertion, Hoard and
Synthetic. Clustering and classification of such type of data
require specialized schemes and most of the procedures
reported use similarity and dissimilarity measures. The
similarity and dissimilarity between two symbolic objects are
defined in terms of three attributes namely, position, span
and content respectively [1] [2]. In the context of mining
important information from large complex data types, such as
multimedia data, it becomes imperative to develop methods
that have generalization ability. Most of the data generated
today closely resembles symbolic data. This work presents
some novel methods to deal with classification of symbolic
objects using machine learning techniques.
Analysis of symbolic data has been explored and expanded
by several researchers such as Edwin Diday and K C
Gowda [1], Ichino [3], D S Guru [4] etc. All of them
have viewed the analysis of symbolic objects from different
mathematical frameworks and reported good results. But
none of the available techniques have the ability to provide
good generalization for the test samples. In other words
neural computing techniques have not been tried on symbolic
objects except a recent work by S.Mitra [5]. In this paper
the authors have taken the bench mark dataset from UCI
machine learning repository and proposed schemes to find the
clusters with respect to medoids and the samples have been
trained using fuzzy Radial basis function neural networks
and have reported very good results. The only drawback of
the proposed scheme is the fixed architecture of the network.
The medoids serve as the fixed optimal centers for the RBF
neural network. While this method works well for small data
sets, for larger data sets the algorithm attracts additional
computational burden since it involves the calculation of
fuzzy membership functions.
In our work, we propose a very simple approach to the
classification of the symbolic objects. The main contributions
of this work are:

22

¾ .n be the ’n’ number of binary ¾ Å be number of samples. We employ the agglomerative clustering algorithm to cluster the symbolic objects using similarity indices computed using equation 1. Experimental results have shown that acceptable generalization performances can be obtained with training sets less than that specified by the VC dimension [10] In the context of classification problems. II. Problem Definition Given a set V of input vectors. A N EW S IMILARITY M EASURE FOR S YMBOLIC O BJECTS It is well known that symbolic objects exist in a generalized form in several applications. then training all them would result in a over fitting architecture and that too at the expense of more resources. Let V = ½ . Theoretical procedures to compute the upper bounds on the number of training samples needed for a specified level of generalization are available.. select ’k’ number of samples from the given training set such that Î = where Ò and ½ . Predominantly symbolic objects are to be used in all image processing applications. The entire procedure is very simple and can be easily applied to larger data sets.. significant patterns or representative patterns. There exist many methods for clustering symbolic objects. the neural networks define a classification boundary. The learning and generalization features of the proposed techniques are illustrated on the standard bench mark soybean dataset from UCI machine learning repository. The above procedure has been applied to soybean data and the class labels are obtained which are in concurrence with the benchmark dataset. Even though incremental learning algorithms offer a better training procedure. we obtain the clusters. they also end up with large number of training cycles. we propose a simple approach to select significant patterns with respect to the mediod in each class... Larger the number of patterns.. The non significant patterns which may be outliers often consume maximum training cycles during learning and therefore need to be removed for improving the learning performance. 23 . . S ELECTION OF S IGNIFICANT PATTERNS It is known that the training patterns control the dynamics of the neural network architecture. 2 Using the binary form of Symbolic data sets. We employed farthest neighbor concept with respect to the medoid within a cluster and select specified number of samples for training the network. the problem is to obtain a subset of V such that the vectors in the subset achieve the desired generalization level. Let the entire set of pattern vectors ½ . Ò be ’n’ vectors which constitute the samples in the input space and Î ¾ ÊÆ . Consider two strings ½ ½ ¾ ¿ Ò and Similarity between ½ Ë where = 1 if ¾ and ½ ¾ ¿ È ¾ is given by Ò ½ ¾ and Ò ½ Ò (1) = 0 if Equation 1 constitutes a new measure which defines the similarity between two samples ½ and ¾ as the ratio of number of similar bits in the corresponding position of the two strings to the total number of bits in the strings. If the training samples are structured and compact. To obtain a subset Î of V belonging to ÊÆ . Those samples which aid in good learning and generalization are called informative patterns. It is seen from the above equation that computation of similarity values is a simple direct approach when compared to traditional methods. In this work. training a neural network is laborious and some times frustrating too. On the other hand... 3 We then compute the medoids in each cluster which have binary symbolic data.. While generating the decision boundaries.. . In such situations... then neural networks can learn fast.... Adaptive Auto-configuring RBF neural network proposed in our earlier work [6] has been used for synthesizing the architecture..1 Conversion of symbolic data into a homogeneous binary strings. Our new method doesn’t require the traditional formule and hence uses a simple representation of symbolic objects in the form of a concatenated binary string. we determine the similarity between them using hamming distance and then using this similarity index. the training patterns often have high dimensions besides being voluminous. ¾ . In our work we propose a simple approach to compute the similarity between symbolic objects through a homogenous binary format for both quantitative and qualitative features. significant patterns.. ¾ . longer will be the training times and larger will be the size of the generated network.. Therefore it becomes imperative to take a look at the input patterns themselves and choose those which makes sense in good learning and generalization.. constitute ½ .. A. many of which are significant and many are not. i = 1. In many real life situations. A widely used theorem is the Vapnik-Chervonenkis (VC) dimension [7] [8] [9]. We compute the similarity indices for the binary equivalent of symbolic objects as follows: Let bits and ½ .... if the training patterns are not structured and noisy..2. Next section introduces the Symbolic data and its features. the neural network often uses all patterns. III.. ..

Connect them by an edge and associate it with an age variable.... The age of all edges emanating from the BMU is increased at every adaptation step. Determine the medoid Å Calculate the distance between each of the input samples in the class and the medoid Å . compute BMU and the next immediate BMU.. find the best matching unit (BMU) using Ü   (7) Move the BMU ½ times the current distance towards the input pattern using Ü Training is conducted with full dataset as well as with selected significant patterns. When an edge is created. T RAINING ܽ   ܾ ¾ (2) where ܽ and ܾ are the coordinates of the two selected RBF units. The modified algorithm has the ability to adapt its learning parameters which control the RBF center movement in the input space [6]. Calculate ¾ using RBF N EURAL N ETWORK As mentioned earlier. These are the samples that represent this class in the training set.. Each one is complex in its own way and bears its own benefits. For each input presented.. For each class ½.. Edges exceeding an age limit AÑ Ü are deleted and so also the nodes having no more emanating edges. 2) For a given training pattern ( of the RBF network using Ñ Ü Ç where Î . Also choose the medoid as one of the samples in addition to those chosen. ½ ). e Append all the selected samples from all the classes to form the training set Î . The insertion of a new RBF unit is based on the squared error accumulated across all the output units... Set their widths to be equal to the distance between them. c Choose a known percentage of samples which are the farthest from the medoid. Ò belong to ’c’ number of classes where c = . ½ a b IV. Algorithm for training the network along with notations used are given in the following sections. Algorithm for Adaptive RBF (ARBF) Network 1) Select two random vectors from the input space as initial RBF centers.. . ¾ ½ ¾ ¾ ½ For every input pattern that is presented. 24 . calculate the output Ï Î ÜÔ   Ü  ¾ (3) ¾ (4) Calculate error using WITH INCREMENTAL LEARNING ADAPTIVE  Ç (5) ¢ (6) 3) Set ½ using Table I. that is . We have modified the incremental learning algorithm proposed by Fritzke [11] and used it for training the significant samples. Notations The following are the notations used in the algorithm presented below: Pattern index Input pattern Desired output Width of RBF Learning rate ½ Center adaptation parameter for BMU ¾ Center adaptation parameter for non-BMU Output layer Hidden layer Input layer Ç Actual output Î Activation output Error Ü RBF center Ï Weight between output and hidden neurons ÅÍ Best Matching Unit ÓÖ ´Ò Ûµ Ü Ò ÓÖ ´ÓÐ µ · ¾ ´Ü   µ (9) 4) Update the weights between the hidden and output units using Ï Ò Û Ï ÓÐ · (10)  Ç Î where is the learning rate which has a small value between 0 and 1. there exist many versions of incremental algorithms for synthesizing RBF networks. Connect them by an edge.¾ . B. 5) The width of RBF units are computed using ’Age’ information. ¾. .. ÑÙ ´Ò Ûµ Ü ÑÙ ´ÓÐ µ · ½ ´Ü   µ (8) Move all the immediate neighbors of the BMU ¾ times their current distance towards the input pattern using ÜÒ A. The above procedure is applied to three bench mark datasets and the patterns derived are shown in Table II. d Repeat steps [a] through [c] for all the classes in the given input set. its age variable is set to zero..

32 0. V.small and large. It can be seen that 70% significant samples have yielded best results when compared with the remaining set of patterns. 7) The learning rate is decremented linearly by a small value during the convergence cycle. Therefore we have trained and tested data having 105 attributes belonging to 15 classes.22 0. 41 samples have missing attributes. In the first approach euclidean norm is computed with respect to the mean of the samples in a class. E XPERIMENTS Since most of the improvements in RBF network construction have been illustrated with well-defined benchmark datasets from UCI machine learning database repository [12]. We have used the large dataset which has more samples.30 0. Each pattern has 35 attributes. Insertion and movement of RBF units are carefully controlled by the adaptation parameters ½ . ¾ and . In the second approach the medoid is used as a reference point and samples are picked with respect to the medoid. The dataset is briefly described below: A. Its width is set to the mean distance between the neighboring units. 25 . Testing and Generalization We have used all the 266 patterns of the Soybean dataset to test the classification accuracy of the RBF architectures generated.26 0.005 6) A RBF unit is inserted between a unit which has accumulated maximum error and any of its neighbors. Table III shows the number of RBF units generated and epochs taken by various significant pattern sets. B.12 0. C.S. Then a percentage of samples are picked for training.15 0.18 0. D.16 0. After deleting these 41 samples. each pattern is a 105 attribute vector consisting of binary equivalents of the original data.17 0.20 0. Its weights are set to small random values. Learning Characteristics The adaptive RBF algorithm is trained with different proportions of significant patterns shown in Table II.28 0. It is evident from the algorithm that the RBF units are subjected to movement in the input space during the entire learning phase.19 0. These parameters help to synthesize optimal network architectures.N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TABLE I TABLE II L OOKUP TABLE FOR CHOOSING ½ S OYBEAN DATA SAMPLES SELECTED FOR TRAINING Error ½¼ ² ½¼ ² ² ² ² ² ¿² ¾² ¿ ½² ¾ ¼ ² ½ ¼ ½² ¼ ¼¼ ² ¼½ ¼ ¼¼ ² ¼¼ ¼ ¼¼¼ ² ¼ ¼¼ ¼ ¼¼¼¼ ² ¼ ¼¼¼ ¼ ¼¼¼¼¼ ² ¼ ¼¼¼¼ ¼ ¼¼¼¼¼¼ ² ¼ ¼¼¼¼¼ ¼ ¼¼¼¼¼¼¼ ² ¼ ¼¼¼¼¼¼ ½ 0.10 0.13 0.24 0. Thus in the proposed study we have obtained 11 different RBF architectures. After the process of binarization.21 0. But out of the 307 samples. The dataset has 307 instances belonging to 19 classes. Table II shows the number of samples selected for training. We have tabulated results using the medoid only as the medoid happens to be an actual sample in the input space where as the mean is a non existant sample. Percentage 10 20 30 40 50 60 70 75 80 90 100 Number 42 69 95 122 148 175 202 219 228 255 266 We have used the binarised representation of the actual dataset available. 8) Repeat steps 2 to 7 until classification error for all the patterns falls below a set value. we have also used the soybean dataset for our experiments and comparison.14 0. Each set of significant patterns is used to synthesize the optimal RBF architecture.11 0. Training set Generation We have investigated 2 approaches in selecting significant training patterns. the dataset has 266 patterns belonging to 15 classes. Nevertheless results from both the approaches are the same. Dataset SOYBEAN LARGE data: Soybean data is available in two forms . Figures 1 and 2 show the learning curves for 70% significant samples. Table IV shows the generalization produced by the networks. The above approaches have been used to select the training samples whose number is progressively increased to study the optimatility of the generated architecture.

“Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity.59 74.” Pattern Recognition. pp. Merz. 1994. “Symbolic clustering using new dissimilarity measure. [7] Y.00 99. 1. VI. Cohn and G. C. 2004.TABLE III 70 70% Significant patterns R ESULTS : E POCHS & RBF UNITS FOR DIFFERENT COMPOSITION OF 60 S IGNIFICANT PATTERNS 50 # 266 255 228 219 202 175 148 122 95 69 42 Epochs 2783 2225 2539 2501 2407 2221 1850 1511 1306 1276 853 RBFs 68 60 67 59 68 62 53 47 44 39 28 RBF units % 100 90 80 75 70 60 50 40 30 20 10 10 0 0 Fig. vol. 1994. Specifically for this dataset. Fritzke. 6. 152. pp. vol. Error 40 30 20 10 0 Fig. B. no. [10] D. “Can neural networks do better than the vapnikchervonenkis bounds?” Advances in Neural Information Processing Systems. 8. [6] T.” Pattern Recognition. 255–262.” ICONIP 2004. vol. vol. 2005. 553–564. 1277–1282.” IEEE Transactions on Systems. B.” Fuzzy sets and systems. [8] E. and P. 567–578. “Adaptive learning in incremental learning rbf networks. 24. 2004. This can be attributed to the small number of patterns that are present in some of the clusters. Opper. “What size net gives valid generalization?” Advances in Neural Information Processing Systems. 81–90. C. 1994. A. pp.60 97. S. 2133–2166. [12] C.” Neural Computation.” 1998. [9] M. And picking significant patterns from a small number does not yield good results. 1991. 24. 1991. “Generalized minkowski metrics for mixed feature-type data analysis. “Uci repository of machine learning databases. we have shown that symbolic data can be classified using auto-configuring RBF network with better generalization. V. [4] D.” Advances in Neural Information Processing Systems. 1. no. 28. 471–476. [3] M. N. R EFERENCES [1] K.81 65. Mostafa. 2.” Pattern Recognition Letters. pp. Mitra. 1203–1213. Yayuchi. 25. pp. 1989. 6. Baum and D. Tesauro. Guru. Ichino and H.50 Epochs Vs Error for 70% Soybean Data It is seen that generalization levels are poor in the lower half of the table. pp. 4. Gowda and T. vol. 1989.79 51. Nagabhushan and S.09 84. “The vapnik-chervonenkis dimension: Information verses complexity in learning. S. 312– 317.71 99. [2] K. Blake and C. 911–917. “Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns. 1. vol. 1995.20 95. “Supervised learning with growing structures. “Symbolic classification. Gowda and E.71 97. Man and Cybernetics.” Physical Review Letters. kiranagi. Nagabhushan. pp. 3. 1500 1000 Epochs 1500 2000 Epochs Vs RBF units for 70% Soybean Data % 100 90 80 75 70 60 50 40 30 20 10 50 1000 Epochs 500 TABLE IV % G ENERALIZATION ACHIEVED 70% Significant patterns 60 500 30 20 70 0 40 2000 # 266 255 228 219 202 175 148 122 95 69 42 Generalization 100. Haussler. Diday. no. 26 . vol. [11] B. “Learning and generalization in a two-layer neural network: The role of the vapnik-chervonenkis dimension. clustering and fuzzy radial basis function network. B. pp. Ravi. Mali and S. pp. Padma. vol. K. [5] K. vol. 70% significant patterns picked up using fartherest neighbour principle with respect to medoid has yielded good results both in terms of network size and training time. pp. More applications in multimedia data mining are under investigation.86 89. C ONCLUSIONS In this research work.