You are on page 1of 9

Appl Intell (2006) 25:243–251

DOI 10.1007/s10489-006-0105-0

An instance-based learning approach based on grey


relational structure
Chi-Chun Huang · Hahn-Ming Lee


C Springer Science + Business Media, LLC 2006

Abstract In instance-based learning, the ‘nearness’ between 1 Introduction


two instances—used for pattern classification—is gener-
ally determined by some similarity functions, such as the In recent years, different learning algorithms, such as
Euclidean or Value Difference Metric (VDM). However, instance-based learning (IBL) [1, 2, 17, 23, 33, 34], rule
Euclidean-like similarity functions are normally only suit- induction [3, 14, 28, 36], decision trees [11, 13, 30], decision
able for domains with numeric attributes. The VDM metrics tables [16, 27] and neural networks [6, 24] have been in-
are mainly applicable to domains with symbolic attributes, vestigated to solve classification problems. As a comparable
and their complexity increases with the number of classes in a alternative, IBL is quite straightforward to understand and
specific application domain. This paper proposes an instance- can yield excellent performance [1, 2], i.e., high classifica-
based learning approach to alleviate these shortcomings. tion accuracy. In IBL, a training set of labeled instances is
Grey relational analysis is used to precisely describe the en- first collected by the learning system. A new, unseen instance
tire relational structure of all instances in a specific domain. is then classified according to its ‘nearest’ training instance or
By using the grey relational structure, new instances can be instances [7, 12]. Based on the nearest-neighbor rule [7, 12],
classified with high accuracy. Moreover, the total number of instance-based learning methods include nearest-neighbor
classes in a specific domain does not affect the complexity classifiers [4, 7, 15], similarity-based induction (learning),
of the proposed approach. Forty classification problems are lazy learners, memory based learning or case-based reason-
used for performance comparison. Experimental results show ing systems [34, 38].
that the proposed approach yields higher performance over In general, the ‘nearness’ between two instances in IBL is
other methods that adopt one of the above similarity func- determined by a similarity function. In the machine learning
tions or both. Meanwhile, the proposed method can yield literature, two metrics, Euclidean metric [1] and the Value
higher performance, compared to some other classification Difference Metric (VDM) [34], are widely used for IBL.
algorithms. Euclidean-like similarity functions are normally suitable for
domains with numeric attributes. The VDM is mainly ap-
Keywords Instance-based learning . Grey relational plicable to domains with symbolic attributes; however, the
analysis . Grey relational structure . Pattern classification number of training instances with different values for each
attribute in a specific application domain should be deter-
mined prior to learning, such that the complexity of using
C.-C. Huang () . H.-M. Lee
Department of Information Management, National Kaohsiung the VDM increases with the number of classes [39]. In this
Marine University, NKMU Kaohsiung, Taiwan 811, R.O.C. paper, an instance-based learning approach based on grey re-
e-mail: cchuang@mail.nkmu.edu.tw lational structure (GRS) is proposed to alleviate these short-
comings. Here, grey relational analysis (GRA) [8–10, 29] is
H.-M. Lee
Department of Computer Science and Information Engineering, used to precisely describe the entire relational structure for
National Taiwan University of Science and Technology, Taipei all instances in a specific application domain. Some proper-
106, Taiwan ties of GRA, including wholeness, asymmetry and normal-
e-mail: hmlee@mail.ntust.edu.tw ity, are helpful for the learning tasks (as stated in Section 3).

Springer
244 Appl Intell (2006) 25:243–251

By using the so-called grey relational structure, new instances the distance will be set to one (i.e., the distance is maximal).
in a specific domain can be classified with high accuracy. Meanwhile, the distance will also be set to one if a or b is
Moreover, the total number of classes in a specific domain unknown (i.e., missing). This biases to a high dependence on
does not affect the complexity of the proposed approach. symbolic attributes or unusually high biases against missing
Forty classification problems are used for performance com- attributes. A heterogeneous similarity function is thus defined
parison. Experimental results have shown that the proposed for domains with both numeric and symbolic attributes. This
approach yields higher performance over other methods that distance function, also known as HOEM [39], was adopted
adopt one of the above-mentioned two similarity functions in IB1, IB2 and IB3 [1].
or both, i.e., Euclidean metric and the Value Difference Met- In [34], another well-known distance function, namely the
ric (VDM). Moreover, the proposed method can yield higher Value Difference Metric (VDM), was proposed to determine
performance, compared to some other classification algo- the ‘nearness’ between instances. For each symbolic attribute
rithms. s of instances, the VDM between two values a and b is defined
The rest of this paper is structured as follows. Some simi- as follows.
C  k
larity functions used for IBL are introduced in Section 2. The
 
concept of grey relational analysis is reviewed in Section 3. In vdm(a, b) =  Nai − Nbi  , (3)
N Nb 
Section 4, an instance-based learning approach based on grey i=1 a
relational structure is presented. In Section 5, experiments
where Na is the number of training instances with value a for
performed on forty datasets are reported. Finally, Section 6
attribute s, and Nai is the number of training instances with
gives our conclusions.
value a for attribute s and output class i; C is the number
of output classes in a specific domain and k is usually set to
1 or 2. In Eq. (3), the two ratios are estimates of the prob-
2 Similarity functions in IBL
abilities of attribute values given the class (i.e., of the class
conditional probabilities).
This section reviews some similarity functions used for
Generally, the VDM is mainly suitable for domains with
IBL. In most IBL systems, the Euclidean similarity func-
symbolic attributes. Some discretization methods were thus
tion has been adopted. Let x and y be two instances with
incorporated with the VDM for dealing with numeric at-
n attributes, denoted as x = (x(1), x(2), . . . , x(n)) and y =
tributes, i.e., numeric attributes were discretized into sym-
(y(1), y(2), . . . , y(n)). The Euclidean metric is defined as
bolic attributes for the learning tasks.
follows.
 In [39], three variants of VDM, namely Heterogeneous
 n
 Value Difference Metric (HVDM), Interpolated Value Differ-
Eu(x, y) =  [x(i) − y(i)]2 . (1) ence Metric (IVDM) and Windowed Value Difference Met-
i=1 ric (WVDM) were proposed. The HVDM uses the VDM and
the above-mentioned normalized-distance (i.e., each distance
An alternative similarity function, the Manhattan distance
for numeric attribute i is divided by 4 standard deviations of
function, is defined as follows.
attribute i) to handle symbolic and numeric attributes, respec-

n
tively. As for the IVDM and the WVDM, different discretiza-
Ma(x, y) = |x(i) − y(i)|. (2)
tion methods were incorporated with the original version of
i=1
VDM. These similarity functions are useful for applications
where x and y are two instances with n attributes, denoted as with both numeric and symbolic input attributes. Further de-
x = (x(1), x(2), . . . , x(n)) and y = (y(1), y(2), . . . , y(n)). tails regarding these three similarity functions are mentioned
Obviously, Euclidean-like distance functions are normally in [39]. Similar to VDM, however, their complexity is in-
applicable to domains with numeric attributes. In general, creased along with the number of classes in the learning
prior to learning, normalization should be done for each nu- tasks. In addition, the PEBLS learning system [32] intro-
meric attribute, i.e., each distance for numeric attribute i is duces a variant of VDM, called the modified VDM (MVDM).
divided by the maximal difference or by the standard devia- MVDM incorporates VDM with a scheme that adjusts the
tion of attribute i. In this manner, the problem of one attribute weight of each attribute for learning. A review of various
with a relatively larger range of values than other attributes similarity functions can be found in [39].
dominating the distance can be avoided. Accordingly, the
distance of each attribute can be bounded between zero and 3 Grey relational analysis
one. To deal with each symbolic attribute i, the distance be-
tween two values a and b of attribute i will be set to zero if a Since its inception in 1984 [8], grey relational analysis
and b are the same (i.e., the distance is minimal); otherwise [8–10] has been applied to a wide variety of application

Springer
Appl Intell (2006) 25:243–251 245

domains. As a measurement method, grey relational analy- The grey relational orders of observations x1 , x2 ,
sis is used to determine the relationships among a referential x3 , . . . , xm can be similarly obtained as follows.
observation and the compared observations based on cal-
culating the grey relational coefficient (GRC) and the grey
GRO(xq ) = (yq1 , yq2 , . . . , yqm ), (7)
relational grade (GRG). Consider a set of m + 1 observa-
tions {x0 , x1 , x2 , . . . , xm }, where x0 is the referential obser-
vation and x1 , x2 , . . . , xm are the compared observations. where q = 1, 2, . . . , m, GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥
Each observation xe includes n attributes and is represented · · · ≥ GRG(xq , yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq ,
as xe = (xe (1), xe (2), . . . , xe (n)). The grey relational coeffi- r = 1, 2, . . . , m, and yqa = yqb if a = b.
cient can be calculated as, In Eq. (7), the GRG between observations xq and yq1
exceeds those between xq and other observations (yq2 , yq3 ,
min + ζ max . . . , yqm ). That is, the difference between xq and yq1 is the
GRC (x0 ( p), xi ( p)) = , (4)
|x0 ( p) − xi ( p)| + ζ max smallest.
Grey relational analysis meets four principal axioms, in-
where min = min∀ j min∀k |x0 (k) − x j (k)|,  max = cluding (a) normality (b) dual symmetry (c) wholeness and
max∀ j max∀k |x0 (k) − x j (k)|, ζ ∈ [0,1] (Usually, ζ = 0.5), (d) approachability [8–10, 29].
i = j = 1, 2, . . . , m, and k = p = 1, 2, . . . , n.
Here, GRC (x0 ( p), xi ( p)) is considered as the similar- (a) normality—GRG(x0 , xi ) takes a value between zero and
ity between x0 ( p) and xi ( p). If GRC (x0 ( p), x1 ( p)) exceeds one
GRC (x0 ( p), x2 ( p)) then the similarity between x0 ( p) and (b) dual symmetry—If only two observations (x0 and x1 )
x1 ( p) is larger than that between x0 ( p) and x2 ( p); otherwise are made in the relational space, then GRG(x0 , x1 ) =
the former is smaller than the latter. Moreover, if x0 and xi GRG(x1 , x0 )
have the same value for attribute p, GRC (x0 ( p), xi ( p)) (i.e., (c) wholeness—If three or more observations are made in
the similarity between x0 ( p) and xi ( p)) will be one. By con- the relational space, then GRG(x0 , xi ) often doesnot equal
trast, if x0 and xi are different to a great deal for attribute GRG(xi , x0 ), ∀i and
p, GRC (x0 ( p), xi ( p)) will be close to zero. Similar meth- (d) approachability—GRG(x0 , xi ) decreases as the difference
ods for dealing with symbolic attributes will be detailed in between x0 ( p) and xi ( p) increases (other values in Eqs.
Section 4. Notably, changing the value of ζ does not affect (4) and (5) are held constant).
GRO and the performance of the proposed learning approach. Based on these axioms, grey relational analysis offers
Restated, the original version of GRC (as stated in [8]) is con- some advantages. For example, it gives a normalized mea-
sidered to determine the similarity between two instances in suring function (Normality)—a method for measuring the
the proposed learning approach. similarities or differences among observations—to analyze
Accordingly, the grey relational grade between instances the relational structure. Also, grey relational analysis yields
x0 and xi is expressed as, whole relational orders (wholeness) over the entire relational
space. As stated in the following section, these properties are
1 n
GRG (x0 , xi ) = GRC(x0 (k), xi (k)), (5) useful for instance-based learning.
n k=1

where i = 1, 2, . . . , m. 4 An instance-based learning algorithm based on grey


The primary characteristic of grey relational analysis is as relational structure
follows.
If GRG (x0 , x1 ) is larger than GRG (x0 , x2 ) then the dif- To build a successful instance-based learning model, the rela-
ference between x0 and x1 is smaller than that between x0 tionships among all instances in a specific application domain
and x2 ; otherwise the former is larger than the latter. should be determined. Here, grey relational analysis is used
According to the degree of GRG, the grey relational order to describe the relational structure of all instances and then
(GRO) of observation x0 can be stated as follows. new, unseen instances can be identified.
As mentioned above, if a set of m + 1 instances {x0 ,
GRO(x0 ) = (y01 , y02 , . . . , y0m ), (6) x1 , x2 , . . . , xm } is given, the grey relational orders of
each instance xq (q = 0, 1, . . . , m) can be expressed as
where GRG(x0 , y01 ) ≥ GRG(x0 , y02 ) ≥ · · · ≥ GRG(x0 , follows.
y0m ), y0r ∈ {x1 , x2 , x3 , . . . , xm }, r = 1, 2, . . . , m, and y0a
= y0b if a = b. GRO(xq ) = (yq1 , yq2 , . . . , yqm ), (8)

Springer
246 Appl Intell (2006) 25:243–251

where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , for classifying a new, unseen instance i, only k inward edges
yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , connected with instance i in the above k-level grey relational
m, and yqa = yqb if a = b. structure are needed. In other words, k nearest neighbors of
Here, a graphical structure, called k-level grey relational each unseen instance are considered for the learning tasks
<
structure (k = m), is defined as follows to describe the relation- (i.e., pattern classification).
ships among referential instance xq and all other instances, Next, an instance-based learning algorithm for pattern
where the total number of ‘nearest’ instances of referential classification based on the k-level grey relational structure
instance xq (q = 0, 1, . . . , m) is restricted to k. is detailed. Assume that we have a training set T of m
labeled instances, denoted by T = {x1 , x2 , . . . , xm }, where
GRO∗ (xq , k) = (yq1 , yq2 , . . . , yqk ), (9) each instance xe has n attributes and is denoted as xe =
(xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen in-
where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , stance x0 , the proposed learning procedure is performed as
yqk ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , k, follows.
and yqa = yqb if a = b.
That is, a directed graph, shown in Fig. 1, can be used Step 1. Calculate the grey relational coefficient (GRC) and
to express the relational space, where each instance xq (q = the grey relational grade (GRG) between x0 and xi , for
0, 1, . . . , m) as well as its k nearest instances (i.e., yqr , r = i = 1, 2, . . . , m.
1, 2, . . . , k) are represented by vertices, and each expression
GRO∗ (xq , k) is represented by k directed edges (i.e., xq to If attribute p of each instance xe is numeric, the
yq1 , xq to yq2 , . . . , xq to yqk ). value of GRC (x0 ( p), xi ( p)) is calculated by
Here, the characteristics of the proposed k-level grey rela- Eq. (4).
tional structure are described in detail. First, for each instance If attribute p of each instance xe is symbolic, the
xq , k instances (vertices) are connected by the inward edges value of GRC (x0 ( p), xi ( p)) is calculated as
from instance xq . That is, these instances are the nearest
neighbors (with small difference) of instance xq , implying GRC (x0 ( p), xi ( p)) = 1, if x0 ( p) and xi ( p) are the same.
that, they evidence the class label of instance xq according
GRC (x0 ( p), xi ( p)) = 0, if, x0 ( p) and xi ( p) are different.
to the nearest neighbor rule [7, 12]. Also, in the one-level
grey relational structure, instance yq1 , with the largest sim-
Accordingly, calculate the grey relational grade
ilarity, is the nearest neighbor of instance xq . Thus, a new,
(GRG) between x0 and xi , for i = 1, 2, . . . , m
unseen instance can be classified according to its nearest in-
by Eq. (5).
stance in the one-level grey relational structure or its nearest
instances in the k-level grey relational structure. Obviously,
Step 2. Calculate the grey relational order (GRO) of x0 based
on the degree of GRG(x0 , xi ), where i = 1, 2, . . . , m.
Fig. 1 k-level grey relational Step 3. Construct the k-level grey relational structure accord-
x0 y 01
structure
ing to the above grey relational order (GRO) of x0 , where
y 02
. k<= m. Here, only k inward edges connected with instance
. x0 are needed.
Step 4. Classify the new instancex0 by considering the class
y 0k
labels of instances y1 , y2 , . . . , yk with the majority voting
x1 y 11 method [7], where y1 , y2 , . . . , yk are the vertexes con-
y 12 nected by k inward edges from x0 in the k-level grey
.
relational structure (i.e., instances y1 , y2 , . . . , yk are the
.
nearest neighbors of instance x0 ). Notably, the best choice
of k used for pattern classification can be determined by
. y 1k cross validation [35].
.
.
As stated in Section 3, GRC (x0 ( p), xi ( p)) can be treated
xm y m1
as the similarity between x0 ( p) and xi ( p). If x0 and xi have the
.
y m2 same value for symbolic attribute p, GRC (x0 ( p), xi ( p)) (i.e.,
. the similarity between x0 ( p) and xi ( p)) will be set to one.
By contrast, if x0 and xi are different for symbolic attribute
y mk p, G RC (x0 ( p), xi ( p)) will be set to zero. These settings are

Springer
Appl Intell (2006) 25:243–251 247

similar to those used in [1]. As mentioned earlier, the similar- ing approach. As mentioned earlier, the complexity of using
ity function presented here offers some advantages, including the VDM is increased along with the number of classes in
normality and wholeness (i.e., asymmetry). That is, this simi- a specific application domain, i.e., O(mnC), where C is the
larity function is appropriate for measuring the similarities or number of classes. This problem does not appear in the pro-
differences among observations and yields whole relational posed learning approach.
orders (wholeness) over the entire relational space in which In addition to deal with classification tasks, the above k-
all instances (or patterns) in a specific domain are treated as level grey relational structure can be used for instance prun-
various vectors. ing or partial memory learning [21, 26, 40]. For example,
In some application domains, instances may contain miss- an instance may not be connected by any inward edges from
ing attribute values (for example, some datasets in the exper-
Table 1 Average accuracy (%) of classification for the proposed ap-
iments in Section 5 contain missing attribute values). In this
proach and other methods with HOEM (with k-nn), HVDM (with k-nn)
paper, to handle domains that contain missing attribute val- and IVDM (with k-nn) [39], respectively. (k) indicates the best value
ues, a method presented in [20] for missing attribute value for k using cross-validation on each classification problem
prediction is applied prior to learning (That is, domains with
Proposed
missing attribute values in the experiments in Section 5 are Dataset HOEM HVDM IVDM approach (k)
handled by using the prediction method first presented in
[20]). In this missing attribute value prediction method, the Allbp 94.89 95.05 95.29 95.48 (13)
nearest neighbors of an instance with missing attribute val- Allhyper 97.09 97.00 97.20 97.35 (11)
Allhypo 90.31 90.16 96.11 92.74 (7)
ues can be determined. Accordingly, the valid attribute val-
Allrep 96.14 96.31 98.25 97.18 (7)
ues derived from these nearest neighbors are used to predict Australian 81.30 81.72 80.52 81.87 (13)
those missing values. After predicting (estimating) missing Autos 74.90 79.79 80.19 76.45 (1)
attribute values with high accuracy, an imperfect dataset can Breast cancer 70.90 66.73 66.73 70.90 (7)
be handled as a complete dataset in classification tasks. Fi- Breast-w 95.54 95.24 95.74 96.78 (5)
nally, the proposed learning approach is applied for classifi- Cpu 68.04 68.51 65.39 70.09 (3)
cation. Notably, any method used for dealing with missing Crx 80.12 81.06 80.27 80.82 (11)
attribute values probably biases the data. Dis 98.20 98.42 98.24 97.77 (3)
Echoi 81.09 80.32 79.36 80.94 (7)
Assume that we have a training set T of m labeled
Glass 69.45 72.63 70.69 74.13 (3)
instances, denoted by T = {x1 , x2 , . . . , xm }, where each Hepatitis 79.40 80.45 81.47 80.84 (7)
instance xe has n attributes and is denoted as xe = Hypothyroid 93.58 93.60 98.06 98.24 (1)
(xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen in- Ionosphere 87.22 86.40 91.14 91.37 (1)
stance x0 in typical instance-based learning methods (in Iris 94.67 94.67 94.67 95.33 (7)
which the Euclidean distance is used as the similarity func- Letter 96.25 96.01 96.10 95.30 (1)
tion), the Euclidean distance between x0 and xa (1 < <
= a = m)
Liver disorders 61.86 62.89 62.43 63.30 (19)
Mushroom 100.00 100.00 100.00 100.00 (1)
is calculated without considering other training instances
Pageblocks 96.17 96.24 96.34 96.43 (1)
(i.e., without considering all xi , i = a, and 1 < <
= i = m). By
Pimadiabetes 70.49 70.25 68.53 69.40 (17)
contrast, in the proposed learning approach, all training in- Satelliteimage 90.23 90.26 90.14 90.19 (5)
stances with n attributes will be considered (calculated) to Satelliteimagetest 88.35 88.32 88.41 88.81 (1)
determine the similarity (i.e., GRC and GRG) between x0 Segment 96.80 97.07 97.33 97.81 (1)
and xa (1 < <
= a = m), i.e., the property of the axiom, whole- Shuttle 99.05 99.86 99.85 99.94 (1)
ness [8] of GRA in Section 3. This consideration is the main Shuttletest 98.88 98.76 98.87 98.99 (1)
difference between Euclidean-based similarity functions and Sick 87.01 86.86 96.84 92.75 (5)
Sickeuthyroid 68.30 68.41 95.08 88.60 (7)
GRA. In other words, in the proposed learning approach, a
Sonar 85.88 87.20 84.24 86.01 (1)
whole relational order (i.e., GRO) by considering all train- Soybean 90.51 90.98 92.03 89.31 (1)
ing instances will be derived for classifying a new, unseen Soybeansmall 100.00 100.00 100.00 100.00 (1)
instance. Sponge 84.29 84.38 84.28 85.37 (1)
In addition, let m denote the number of compared in- Tae 63.71 60.99 60.33 63.51 (1)
stances and n be the number of attributes. The time for clas- Vehicle 70.01 70.90 69.53 70.86 (5)
sifying a new, unseen instance (including the time for cal- Voting 93.57 95.17 95.17 93.57 (5)
culating the grey relational order) is O (mn + m log m). In Vowel 98.52 98.67 98.52 98.95 (1)
Wine 94.89 95.46 97.47 97.30 (13)
addition, the time for discovering of the best value for k in
Yeast 53.37 53.72 53.21 53.44 (19)
the proposed learning approach should also be included as Zoo 95.45 95.34 96.43 96.04 (1)
the complexity of the proposed learning approach. The two Average 85.91 86.15 87.26 87.35
parts are the overall time complexity of the proposed learn-

Springer
248 Appl Intell (2006) 25:243–251

Table 2 The statistic analysis for the worse test result of X/Y under method
proposed approach and other learn- S column indicates that the proposed
ing methods with HOEM, HVDM and learning approach performs better than
IVDM [39], respectively. A better or method S in X cases

HOEM HVDM IVDM Proposed approach

Average accuracy 85.91 86.15 87.26 87.35


Better or worse test (B/W test) 29/7 27/11 26/12 −
Wilcoxon test 99.50 99.20 88.70 −

Table 3 Average accuracy (%) of classification for the proposed approach and other classification algorithms

Decision Hyper Naive Decision Proposed


Dataset Baseline Stump Pipes VFI 1R Bayes Table C4.5 approach

Allbp 95.25 94.82 35.04 44.04 95.93 93.89 96.89 97.29 95.48
Allhyper 97.25 97.21 97.93 87.14 97.57 95.68 98.61 98.64 97.35
Allhypo 92.14 95.75 98.21 91.68 96.68 95.00 99.11 99.43 92.74
Allrep 96.89 96.89 96.89 93.07 96.82 93.96 99.21 99.25 97.18
Australian 55.51 85.51 44.93 86.81 85.51 76.67 85.07 85.51 81.87
Autos 32.74 44.93 62.95 58.93 62.90 58.02 80.05 82.52 76.45
Breast cancer 70.30 70.69 69.94 67.13 65.39 74.42 74.14 75.18 70.90
Breast-w 65.52 91.70 88.42 96.00 91.84 96.00 94.42 95.28 96.78
Cpu 57.90 65.10 61.24 55.57 69.86 68.98 67.93 66.05 70.09
Crx 55.51 85.51 60.43 84.78 85.51 77.68 85.51 85.94 80.82
Dis 98.39 98.39 48.61 55.21 98.25 95.14 98.75 99.14 97.77
Echoi 42.58 64.63 70.91 74.14 63.71 86.34 79.19 84.25 80.94
Glass 35.50 44.87 50.97 57.90 58.35 50.39 69.18 66.71 74.13
Hepatitis 79.38 81.29 65.16 83.88 80.00 82.58 83.87 83.23 80.84
Hypothyroid 95.23 97.38 95.51 48.61 97.91 97.85 99.08 99.24 98.24
Ionosphere 64.10 82.61 92.60 94.02 82.04 82.36 89.75 90.90 91.37
Iris 33.33 66.67 93.33 96.67 93.33 96.00 93.33 95.33 95.33
Letter 4.07 7.09 22.25 61.23 17.25 64.11 71.24 88.23 95.30
Liver disorders 57.98 61.17 44.62 59.76 57.40 55.96 56.23 66.38 63.30
Mushroom 51.80 88.68 99.77 99.88 98.52 95.75 100.00 100.00 100.00
Pageblocks 89.77 93.13 91.39 87.01 93.55 90.08 95.85 96.00 96.43
Pimadiabetes 65.11 72.01 35.03 64.84 72.28 75.78 74.48 74.09 69.40
Satelliteimage 24.17 44.74 48.00 71.52 59.82 79.55 82.57 86.11 90.19
Satelliteimagetest 23.50 41.70 58.30 72.70 58.05 79.05 80.80 83.25 88.81
Segment 14.29 28.53 75.50 77.45 63.98 80.30 91.69 97.10 97.81
Shuttle 78.41 86.94 84.15 78.27 94.69 91.52 99.75 99.96 99.94
Shuttletest 79.16 86.77 87.61 82.68 94.65 92.88 99.70 99.90 98.99
Sick 93.89 96.75 93.86 61.93 96.54 92.57 97.54 98.82 92.75
Sickeuthyroid 90.74 94.44 90.74 46.38 94.91 84.00 97.28 97.79 88.60
Sonar 53.38 71.60 59.57 55.76 63.93 65.93 72.55 74.07 86.01
Soybean 13.03 26.06 89.90 82.44 39.41 90.55 83.39 88.93 89.31
Soybeansmall 35.50 57.00 100.00 97.50 83.50 97.50 100.00 98.00 100.00
Sponge 16.07 41.25 80.36 75.89 44.82 84.46 73.04 66.07 85.37
Tae 34.42 35.75 47.04 52.96 42.42 53.63 50.96 53.04 63.51
Vehicle 25.77 39.59 38.55 53.55 52.70 44.45 67.96 73.28 70.86
Voting 61.38 95.62 38.62 90.34 95.62 90.57 94.49 97.00 93.57
Vowel 9.09 17.58 36.67 60.20 34.34 67.78 67.07 80.81 98.95
Wine 39.93 59.51 91.08 95.46 76.93 97.22 91.08 94.41 97.30
Yeast 31.20 40.70 35.85 50.27 40.30 58.09 56.88 54.39 53.44
Zoo 40.64 60.45 94.09 94.09 73.36 95.18 89.18 92.09 96.04
Average 55.02 67.78 69.40 73.69 74.26 81.20 84.70 86.59 87.35

Springer
Appl Intell (2006) 25:243–251 249

Table 4 The statistic analysis for the proposed approach and other various classification algorithms. A better or worse test result of X/Y under
method S column indicates that the proposed learning approach performs better than method S in X cases

Baseline DecisionStump HyperPipes VFI 1R NaiveBayes DecisionTable C4.5 Proposed approach

Average accuracy 55.02 67.78 69.40 73.69 74.26 81.20 84.70 86.59 87.35
Better or worse test (B/W test) 37/3 31/9 33/6 35/5 30/10 32/8 21/17 17/21 −
Wilcoxon test 99.50 99.50 99.50 99.50 99.50 99.50 94.05 54.35 −

other instances in the k-level grey relational structure. In other ods and the proposed approach. Here, “Baseline” means that
words, this instance is rarely used in determining the class the majority class is simply chosen for classification. Simi-
labels of other instances, implying that, it is probably a good larly, Table 4 gives the statistical analysis, including better or
choice for instance pruning in a learning system. worse test (B/W test; for example, a better or worse test re-
sult of 32/8 under NaiveBayes column indicates that the pro-
posed learning approach performs better than NaiveBayes in
5 Experimental results 32 cases) and Wilcoxon Signed Ranks test [37] (i.e., the pro-
posed approach is compared with others), for comparing the
In this section, experiments performed on forty data sets above learning methods. As a result, the proposal presented
(from [5]) are reported to demonstrate the performance of here can yield higher performance, compared to some other
the proposed learning approach. In the experiments, ten-fold classification algorithms.
cross validation [35] was used and applied ten times for each
application domain. That is, the entire data set of each appli-
cation domain was equally divided into ten parts in each trial; 6 Conclusions
each part was used once for testing and the remaining parts
were used for training. Accordingly, the average accuracy of In this paper, an instance-based learning approach based on
classification was obtained. grey relational structure is proposed. Grey relational analy-
Table 1 gives the performances (average classification ac- sis is used to precisely describe the entire relational structure
curacy) of the proposed approach (i.e., h-nn with GRG), the of all instances in a specific application domain. By using
HOEM (with k-nn), HVDM (with k-nn) and IVDM (with k- the above-mentioned grey relational structure, new instances
nn) [39]. These distance functions used for comparison are can be identified with high accuracy. Experiments performed
mentioned in Section 2. As shown in Table 2, the statistical on forty application domains are reported to demonstrate the
analysis, including better or worse test (B/W test; for exam- performance of the proposed approach. It can be easily seen
ple, a better or worse test result of 29/7 under the HOEM that the proposed approach yields high performance over
column means that the proposed learning approach performs other methods that adopt one of Euclidean metric and the
better than the HOEM in 29 cases) and Wilcoxon Signed Value Difference Metric (VDM) or both. Moreover, the pro-
Ranks test [37] (i.e., the proposed approach is compared with posal presented here can yield higher performance, compared
others), was done for the above methods with various distance to some other classification algorithms. For some domains
functions. Here, Wilcoxon Signed Ranks test was used to test with pure symbolic attributes in the experiments, instance-
the null hypothesis that the differences (classification accu- based learning approaches using the VDM perform better
racy) between two methods are distributed symmetrically than the proposed learning approach, which is based on the
around zero (i.e., to determine if one method is significantly grey relational structure. As pointed earlier, the VDM is
better than another, regarding the classification accuracy). mainly applicable to domains with symbolic attributes. For
Of these forty application domains, the proposed approach further work, the VDM will be incorporated with the pro-
reveals its superiority over the HOEM (with k-nn) and the posed metric to increase the performance of the correspond-
HVDM (with k-nn). Meanwhile, the classification accuracy ing instance-based learning approach.
of the proposed approach is comparable to that of the IVDM
(with k-nn).
Acknowledgments This work was supported in part by the National
Furthermore, various classification algorithms were also Digital Archive Program-Research & Development of Technology Di-
used for performance comparison, including DecisionStump vision (NDAP-R & DTD), the National Science Council of Taiwan
[41], DecisionTable [27], HyperPipes [41], C4.5 [31], Naive- under grant NSC 94-2422-H-001-006, and by the Taiwan Information
Security Center (TWISC), the National Science Council under grant
Bayes [25], 1R [18] and VFI [41]. These methods are all
NSC 94-3114-P-001-001-Y. In addition, the authors would like to thank
available in [41]. Table 3 gives the performances (average the National Science Council of Taiwan for financially supporting this
accuracy of classification) of the above classification meth- research under grant NSC 94-2213-E-022-006.

Springer
250 Appl Intell (2006) 25:243–251

References Advanced in intelligent data analysis V, Lecture Notes in Computer


Science 2810, Springer-Verlag, pp 68–75
22. Huang YP, Huang CH (1997) Real-valued genetic algorithms for
1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning
fuzzy grey prediction system. Fuzzy Sets Syst 87:265–276
algorithms. Mach Learn 6:37–66
23. Hullermeier E (2003) Possibilistic instance-based learning. Art
2. Aha DW (1992) Tolerating noisy, irrelevant and novel attributes
Intell 148:335–383
in instance-based learning algorithms. Int J Man-Mach Stud
24. Ignizio JP, Soltys JR (1996) Simultaneous design and training of
36(2):267–287
ontogenic neural network classifiers. Comp Oper Res 23:535–546
3. An A (2003) Learning classification rules from data. Comp Math
25. John GH, Langley P (1995) Estimating continuous distributions
Appl 45:737–748
in bayesian classifiers. In: Proc. of the Eleventh Conference on
4. Bay SD (1999) Nearest neighbor classification from multiple fea-
Uncertainty in Artificial Intelligence, pp 338–345
ture subsets. Intell Data Anal 3:191–209
26. Kibler D, Aha DW (1987) Learning representative exemplars of
5. Blake CL, Merz CJ (1998) UCI Repository of machine learning
concepts: An initial case study. In: Proc. of the fourth international
databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html].
workshop on machine learning. Morgan Kaufmann, CA, Irvine,
Irvine, CA: Department of Information and Computer Science, Uni-
pp 24–30
versity of California
27. Kohavi R (1995) The power of decision tables. In: European
6. Brouwer RK (1997) Automatic growing of a hopfield style network
conference on machine learning, pp 174–189
during training for classification. Neur Netw 10:529–537
28. Langley P, Simon HA (1995) Applications of machine learning
7. Cover TM, Hart PE (1967) Nearest neighbor pattern classification.
and rule induction. Commun ACM 38(11):55–64
IEEE Trans Inform Theory 13(1):21–27
29. Lin CT, Yang SY (1999) Selection of home mortgage loans using
8. Deng J (1984) The theory and method of socioeconomic grey sys-
grey relational analysis. J Grey Syst 4:359–368
tems. Soc Sci China 6:47–60 (in Chinese)
30. Quinlan JR (1986) Induction of decision trees. Mach Learn
9. Deng J (1989) Introduction to grey system theory. J Grey Syst 1:1–
1:81–106
24
10. Deng J (1989) Grey information space. J Grey Syt 1:103–117 31. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan
11. Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: The- Kaufmann Publishers, San Mateo, CA
oretical foundations. Int J Appr Reas on 28:91–124 32. Rachlin J, Kasif S, Salzberg S, Aha DW (1994) Towards a better
12. Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric understanding of memory-based and bayesian classifiers. In: Proc.
discrimination: consistency properties. Technical Report Project of the eleventh international machine learning conference, NJ,
21-49-004, Report Number 4, USAF School of Aviation Medicine, Morgan Kaufmann, New Brunswick, pp 242–250
Randolph Field, Texas 33. Salzberg S (1988) Exemplar-based learning: theory and imple-
13. Freund Y, Mason L (1999) The alternating decision tree learning mentation. Technical Report TR-10-88, Center for Research in
algorithm. In: Proc. of the 16th international conference on machine Computing Technology, Harvard University
learning, Bled, Slovenia, pp 124–133 34. Stanfill C, Waltz D (1986) Towards memory-based reasoning.
14. Friedman JH (1977) A recursive partitioning decision rule for non- Commun ACM 29(12):1213–1228
parametric classification. IEEE Trans Comp, pp 404–408 35. Stone M (1974) Cross-validatory choice and assessment of
15. Hattori K, Takahashi M (2000) A new edited k-nearest neighbor rule statistical predictions. J Royal Stat Soc B 36:111–147
in the pattern classification problem. Pattern Recog 33:521–528 36. Tsumoto S (2003) Automated extraction of hierarchical decision
16. Hewett R, Leuchner J (2003) Restructuring decision tables for elu- rules from clinical databases using rough set model. Expert Syst
cidation of knowledge. Data Knowl Engin 46:271–290 Appl 24:189–197
17. Hickey RJ, Martin RG (2001) An instance-based approach to pat- 37. Watson CJ, Billingsley P, Croft DJ, Huntsberger DV (1993)
tern association learning with application to the English past tense Statistics for management and economics, 5th edn. Allyn and
verb domain. Knowl-Based Syst 14:131–136 Bacon, Boston
18. Holte RC (1993) Very simple classification rules perform well on 38. Watson I (1999) Case-based reasoning is a methodology not a
most commonly used datasets. Mach Learn 11:63–91 technology. Knowl-Based Syst 12:303–308
19. Hu YC, Chen RS, Hsu YT, Tzeng GW (2002) Grey self-organizing 39. Wilson DR, Martinez TR (1997) Improved heterogeneous distance
feature maps. Neurocomputing 48:863–877 functions. J Art Intell Res 6:1–34
20. Huang CC, Lee HM (2001) A grey-based nearest neighbor approach 40. Wilson DR, Martinez TR (2000) Reduction techniques for
for predicting missing attribute values. In: Proc. of 2001 national exemplar-based learning algorithms. Mach Learn 38(3):257–268
computer symposium, Taiwan, pp B153–159 41. Witten I, Frank E (2000) Data mining—practical machine learn-
21. Huang CC, Lee HM (2003) A partial-memory learning system ing tools and techniques with java implementations. Morgan
based on grey relational structure. In: Berthold MR et al (eds) Kaufmann, San Francisco, CA

Springer
Appl Intell (2006) 25:243–251 251

Chi-Chun Huang is currently Assistant Professor in the Department Hahn-Ming Lee is currently Professor in the Department of Computer
of Information Management at National Kaohsiung Marine University, Science and Information Engineering at National Taiwan University
Kaohsiung, Taiwan. He received the Ph.D. degree from the Depart- of Science and Technology, Taipei, Taiwan. He received the B.S. de-
ment of Electronic Engineering at National Taiwan University of Sci- gree and Ph.D. degree from the Department of Computer Science and
ence and Technology in 2003. His research includes intelligent Internet Information Engineering at National Taiwan University in 1984 and
systems, grey theory, machine learning, neural networks and pattern 1991, respectively. His research interests include, intelligent Internet
recognition. systems, fuzzy computing, neural networks and machine learning. He
is a member of IEEE, TAAI, CFSA and IICM.

Springer