You are on page 1of 5

Attribute Incremental Induction of Decision Trees

Ms. M. M. Parikh Dr. K. Kotecha


Computer Engineering Department Principal,
Sarvajanik College of Engineering and Tech. G. H. Patel College of Engineering and Tech.
Dr. R. K. Desai Marg Vallabh Vidyanagar
Near Athwa Gate Anand
Surat Gujarat
Gujarat
mita_parikh@yahoo.com drketankotecha@yahoo.com

Abstract successfully applied to problems such as data mining,


handwriting recognition, speech recognition, driving
In this paper an algorithm for attribute-incremental learning autonomous vehicles and playing intelligent games such as
of decision trees has been proposed. The new algorithm chess to a world-class level.
proposed, called AIIDT (Attribute Incremental Induction of Decision Tree Learning is a popular, widely
Decision Trees) imposes the effect of a newly discovered practiced machine learning method. Decision trees have
attribute onto a previously learned decision tree. Moreover, several desirable properties; they are applicable to a broad
in doing so, we do not dispose any parts of the original range of problems involving classification and inference,
decision tree. The proposed algorithm AIIDT just lets us they can be used to acquire concepts even when data is
learn whatever is possible without discarding any subtrees noisy (i.e. not 100% accurate) or some values are missing,
from the older decision tree. This method will work whether they can be used whether the data is continuous or discrete
the new attribute is discrete or continuous. This method lets (with the use of suitable methods of discretization[Fayyad
us study the effect of the new attribute on our problem. For and Irani,]), they are capable of learning conjunctive and
decision trees, algorithms exist that can incorporate the disjunctive expressions[Mitchell, 1997], they lend
effect of new training instances on an existing decision tree, themselves to easy combination with other machine learning
but AIIDT takes this a step further and allows us to methods. The major attraction of decision trees is that the
incorporate the effect of a new attribute on a previously learned concepts are easily interpretable by humans.
learned decision tree that did not account for this new Decision trees can be re-represented as a set of if-then rules
attribute. for easy interpretation. Several algorithms are available for
building (ID3[Quinlan, 1986], C4.5[Kohavi and Quinlan,
1999]), pruning(Reduced Error Pruning, Cost Complexity
Pruning, Pessimistic Error Pruning[Kohavi and Quinlan,
1 Introduction 1999], Virtual Pruning[Utgoff, 1989] ; see [Esposito et al.,
1992] for comparisions ) and validating decision trees.
The ability to learn from experience is a fundamental Incremental methods for decision trees proposed so
property of intelligent behavior. Human beings are
far deal only with instance-incremental learning. For
intelligent because they learn from experience during their
example, algorithms like ID4[Utgoff, 1989], ID5[Utgoff,
lifetime, and this enables them to make intelligent decisions
1989], ID5R[Utgoff, 1997] incorporate the effect of new
in similar situations, as the ones experienced by them
earlier. Machine Learning is a field that strives to training instances on an existing decision tree. Zhou and
incorporate this learning ability into machines. Several Chen[Zhou and Chen, 2002] have pointed out other types of
machine learning methods are currently being researched incremental learning like attribute-incremental learning and
such as simple concept learning, decision tree learning, class-incremental learning. Attribute-Incremental Learning
bayesian network learning, neural network learning and aims to incorporate the effect of a newly discovered
learning by genetic algorithms. Machine Learning has been attribute on an existing learned system, whereas Class-

----------------------------------------------------------------------------------------------

1
The data for the CHD problem has been donated by
Desai Laboratories (Surat, Gujarat, India).
Incremental Learning aims to incorporate the effect of a d) Construct decision tree T2 using only
newly introduced (or discovered) class on an existing instances from S2.
learned system. However, to the best of our knowledge no e) Construct tree T3 from T1 and T2 which
methods for building attribute-incremental or class- retains all the learned concepts from T1,
incremental decision trees have been proposed so far. and adds whatever possible from T2 as
below:
1. If An+1 is the root node
2. Attribute Incremental Induction of Decision R2 of T2, then, examine
the decision nodes at
Trees each test value in R2. If
any of these decision
We have attempted to find a solution for building nodes N tests the same
attribute-incremental decision tree. Suppose we have a attribute as the root
decision tree for a problem, with which we are reasonably node R1 of T1, then set
satisfied. Now, suppose a new attribute comes to light, and the root node of T3 as
we would like to know the effect of this attribute on our R2, and replace the
problem. It may be that we do not have sufficient training subtree rooted at N by
instances with this new attribute to build a new decision tree the subtree rooted at R1
and the training instances are very slow in coming, so that in T3. Else, go to step
we do not want to wait a long time until we have collected e.2.
sufficient training data. Now, the new algorithm proposed, 2. Set T3 initially to be
called AIIDT (Attribute Incremental Induction of Decision same as T1. For each
Trees) imposes the effect of a newly discovered attribute branch in T2,
onto a previously learned decision tree. Moreover, in doing 3. If An+1 appears
so, we do not dispose any parts of the previously learned anywhere along the
decision tree. The proposed algorithm AIIDT just lets us branch, note the parent
learn whatever is possible without discarding any subtrees node P, the testvalue t at
from the older decision tree. This method will work whether P for which An+1 was
the new attribute is discrete or continuous. This method lets tested, and the child
us study the effect of the new attribute on our problem. node C of P along the
branch for testvalue t.
2.1 Algorithm AIIDT 4. For each occurrence of
P in T3:
Input: I. If t is a new
1. A Decision Tree T1 built using an instance set S1, testvalue for P,
that does not have values for the new attribute An+1. then add a new
2. A new instance set S2 that also has values for the branch from P
attribute An+1. with testvalue t
and copy the
Output: subtree rooted at
1. A new Decision Tree T3 that incorporates the effect An+1 (of T2)
of S2 onto already existing T1. below it.
Steps: II. Else, if P (of T3)
a) Let T1 be the older decision tree learned already has a
with attribute set A={A1, A2…An} using branch with
instances set S1. testvalue t
b) Let An+1 be the new attribute, whose effect leading to C,
we want to learn on our decision tree, i.e. then, introduce
we want to build a decision tree with decision node
attribute set A’={A1, A2…An, An+1}. An+1 as an
c) Let S2 be the new instance set that has additional test
instances with values for attribute An+1. between P and C.

----------------------------------------------------------------------------------------------

1
The data for the CHD problem has been donated by
Desai Laboratories (Surat, Gujarat, India).
In doing this, the
subtree at C is to Remark 1 : The algorithm naturally favors the concepts in
be retained, i.e. it T1 but this is simply because we too have more confidence
is not to be in the concepts of tree T1 which have been in existence and
replaced by the use since a long while, whereas the concepts in T2 are new
subtree of An+1 and subject to change as more and more instances are
rooted at C. collected for building tree T2.

Remark 2 : The tree T2 can be built non-incrementally using


Figures 1 to 3 illustrate the working of proposed algorithm algorithms like ID3 or C4.5; or it can be built incrementally
AIIDT. using an algorithm like ID5R, as per convenience. For the
tree T1 it is more logical to use a non-incremental
algorithm.
Figure 1 Illustrating step e.1 of algorithm AIIDT
T T2 T3
The algorithm AIIDT was tried out on the ''Coronary
1
A A T1 Heart Disease'' data. Four out of the seven attributes in the
A
T2 n+ n+
T3 CHD training data set are continuous. These were also
1

A discretized using the method proposed by Fayyad and Irani


A A3 A1
1
A1 [Fayyad and Irani, 1992]. Fayyad and Irani have also
2 A suggested an extension of their original method to obtain
P A A
1
multiple splits[Fayyad and Irani, 1993], but this extension
t A
2 2 3 A has no theoretical proof. An alternate solution to obtain
A A3
2
An
3
multiple splits has also been given in [Elomaa and Rousu ,
P
1996].
t t
P t 2.2 The CHD Problem
C 1
t 2 An
t t Doctors routinely conducted six tests (listed in
1
2 C Table 4) to assess the risk of coronary heart disease(CHD)
Figure 2 Illustrating step e.4.I of algorithm AIIDT C for a patient in the near future (approximate time span of 2
years). So, the initial CHD database contained 6 attributes
summarized in Table 4.

T1 T2 T3
A1
A1
P

t A2
A3
A2 A3 TABLE 4
An+1
ATTRIBUTES USED IN THE CHD PROBLEM
P
TEST TYPE OF ATTRIBUTE
t
P t1
t2
1. serum cholesterol continuous
t
C
An+1 2. cholesterol/HDL ratio continuous
t1
t2 3. cholesterol/LDL ratio continuous
C 4. Diabetic discrete (boolean)
C 5. Male/Female discrete (boolean)
6. Age continuous

Figure 3. Illustrating step e.4.II of algorithm AIIDT

----------------------------------------------------------------------------------------------

1
The data for the CHD problem has been donated by
Desai Laboratories (Surat, Gujarat, India).
The analysis from these tests were considered satisfactory seventh attribute is the Homocystine attribute. The decision
most of the time. Afterwards, a new factor called tree thus obtained is shown in Figure 6.
Homocystine was also introduced. Its effect was also felt to
be significant. The doctors wanted to know how exactly the
Homocystine factor combined with the previous six well- Homo
-cystine
known and well-accepted factors in the determination of <16.07
coronary heart disease. >=16.07

serum High
cholester
The decision tree T1 obtained by top-down induction from <194.5 ol
CHD risk

the first CHD datafile containing 6 attributes and 300 >=194.5


training instances is shown in Figure 5. (No data values chol/
High
were collected for the 7th attribute, simply because its HDL ratio
CHD risk
<5.35
significance was not known at that time.) >=5.35

High
LDL/HDL CHD risk
ratio
>=3.78
<3.78
serum
cholesterol
<213 Low CHD Diabetic
risk ?
>=213 no yes

chol/ HDL
ratio High Low CHD High
CHD risk risk CHD risk
<6
>=6
Figure 6 Decision Tree T2
LDL/ HDL
ratio High
CHD risk Finally, the third decision tree T3 obtained by applying
<4
>=4 algorithm AIIDT on T1 and T2 is shown in Fig. 2-6.

Low
Diabetic ?
CHD risk

no
yes

Male / High
Female ? CHD risk
female
male

High
CHD risk Age ?

<42.5 >=42.5

Low High
CHD risk CHD risk

Figure 5 Decision Tree T1

The second decision tree T2 was built from CHD data


that contained 7 attributes and 73 training instances. The

----------------------------------------------------------------------------------------------

1
The data for the CHD problem has been donated by
Desai Laboratories (Surat, Gujarat, India).
References
Homo
-cystine
<16.07 >=16.0
7 [Mitchell, 1997] T. Mitchell. Machine Learning. McGraw
Hill, 1997.
serum High
cholest CHD
<213 erol risk [Fayyad and Irani] .U. M. Fayyad and K.B. Irani, On the
>=213 handling of continuous-valued attributes in decision tree
chol/
HDL High generation, Machine Learning 8, 87--102. 1992.
ratio CHD
risk
[Utgoff, 1989] P. E. Utgoff. Incremental induction of
>=6
<6 decision trees, 1989.
High
CHD
LDL/HD
L ratio risk [Utgoff, 1997] P. E. Utgoff. Decision tree induction based
>=4 on efficient tree restructuring, 1997.
<4

[Zhou and Chen, 2002] Z. H. Zhou and Z. Q. Chen. Hybrid


Diabetic
? decision tree, 2002.
no yes
[Fayyad and Irani, 1993] U. M. Fayyad and K. B. Irani.
High Multi-interval discretization of continuous-valued attributes
Male/
Female
CHD
risk
for classification learning. Proc. Of 13th International Joint
male ? female Conference on Artificial Intelligence, pg. 1022-1027, 1993.

High
CHD
Age [Elomaa and Rousu, 1996] T. Elomaa and J. Rousu. Finding
risk <42.5 >=42.5
optimal multi-splits for numerical attributes in decision tree
Low High learning, 1996.
CHD CHD
risk risk
[Kohavi and Quinlan, 1999] R. Kohavi and R. Quinlan.
Figure 7 Decision Tree T3 obtained by AIIDT Decision tree discovery, 1999.

Continuous attributes were discretized using the method [Quinlan, 1986] J. R. Quinlan. Simplifying decision trees,
proposed by Fayyad and Irani for obtaining optimal binary 1986.
split[2].The labels on the two sides of the arrows in Figures [Webb, 2002] G. I. Webb. Furthur experimental evidence
5-7. show these values. The new decision tree thus obtained against the utility of Occam’s Razor. Journal of Artificial
using AIIDT has shown 97% accuracy over the CHD data Intelligence Research 4(397-417), 2002.
set.
[Amor et al., 2004] N. Amor, S. Benferhat and Z. Elouedi.
Naïve bayes vs decision trees in intrusion detection systems.
3 Conclusion ACM Symposium on Applied Computing, 2004.

Decision Trees are quick and effective means of learning [Esposito et al.,1992] F. Esposito, D. Malerba and G.
target concepts from given data. Algorithms like ID3 and Semeraro. A comparative analysis of methods for decision
C4.5 as well as the instance-incremental algorithms like tree generation, 1992.
ID5R can be used to solve a variety of problems for concept
learning and classification. The method proposed for binary
discretization [Fayyad and Irani, 1992 ] has worked out very
well for the CHD problem. The proposed new algorithm
AIIDT was successful in obtaining a good decision tree for
the attribute-incremental setting. To the best of our
knowledge, this is the first algorithm proposed for attribute-
incremental learning of decision trees.

----------------------------------------------------------------------------------------------

1
The data for the CHD problem has been donated by
Desai Laboratories (Surat, Gujarat, India).

You might also like