Professional Documents
Culture Documents
2009 Int C 017 PP PDF
2009 Int C 017 PP PDF
Minghao Piao, Jong Bum Lee, Khalid E.K. Saeed, and Keun Ho Ryu
1 Introduction
Decision trees are commonly used for gaining information for the purpose of decision
making. For inductive learning, decision tree is attractive for 3 reasons: (1) Decision
tree is a good generalization for unobserved instance, only if the instances are
described in terms of features that are correlated with the target concept. (2) The
methods are efficient in computation that is proportional to the number of observed
training instances. (3) The result of decision tree provides a representation of the
concept that explainable to human.
The basis of many existing decision trees is Hunt's algorithm and developed trees
are ID3 [1], C4.5 [2], and C5.0 [3]. Tree induction of those non-incremental ap-
proaches is based on a consistent set of labeled examples and the process for inducing
a decision tree is quite inexpensive because exactly one tree is generated and maps a
single batch of examples to a particular tree. ID4 [4], ID5R [5] and ITI [6] are incre-
mental approaches and the results are as correct as non-incremental approaches do.
For improving the classifier’s accuracy, ensemble methods are used to construct a
set of base classifiers from training data set and perform the classification work by
voting on the predictions made by each classifier. The ensemble of classifiers can be
constructed in many ways [7] and most widely used is by manipulating the training
set like Bagging and boosting. Three interesting observations are described in [8]
based on the study of many ensemble methods: (1) Many ensembles constructed by
R. Huang et al. (Eds.): ADMA 2009, LNAI 5678, pp. 587–594, 2009.
© Springer-Verlag Berlin Heidelberg 2009
588 M. Piao et al.
the Boosting method were singletons. Due to this constraint, deriving classification
rules have a limitation: decision trees are not encouraged to derive many significant
rules and they are mutually exclusive and covering the entire of training samples
exactly only once. (2) Many top-ranked features possess similar discriminating merits
with little difference for classification. This indicates that it is worthwhile to employ
different top-ranked features as the root nodes for building multiple decision trees. (3)
Fragmentation problem is the another problem does those ensemble methods have: as
less and less training data are used to search for root nodes of sub-trees.
Base on those observations, we need a method that can break the singleton cover-
age constraint and solve the fragmentation problem. Also, the method should be pos-
sible to deal with incrementally collected large data set and handle the data set with
growing models and guarantee for the accuracy.
2 Related Works
In this section, we will describe some non-incremental and incremental decision tree
induction. Also, we will illustrate some widely used ensemble method and the use in
incremental induction task.
One approach to the induction task would be ID3 which is a simple decision tree
learning algorithm developed by Ross Quinlan. The construction of ID3 is based on
information theory and constructs the tree by employing a top-down, greedy search
strategy through the given sets to test each attribute at every tree node. In order to
select an attribute that is most useful for classifying an instance, it uses information
gain [1], [10]. Suppose there is a given an example data set S, the entropy of S could
be derived from the Equation 1:
Where P(p) is the proportion of positive examples in S and P(n) is the proportion of
negative examples is S. Then the information gain is as shown in Equation 2:
V
SV
InformationGain( S , A) = Entropy ( S ) − ∑ × Entropy ( SV ) (2)
i =1 S
Where A is an attribute and the value of i from 1 to V is the domain of A, and SV is the
number of records which contains the value V.
The decision tree algorithm C4.5 [2] is developed from ID3 in the following ways:
Handling missing data, handling continuous data, and pruning, generating rules, and
splitting. For splitting purpose, C4.5 uses the Gain Ratio instead of Information Gain.
C4.5 uses the largest Gain Ratio that ensures a larger than average information gain.
Given a data set D, and it is split into s new subsets S = {D1, D2, ... , Ds}:
Gain( D, S )
GainRatio( D, S ) = (3)
SplitINFO
Discovery of Significant Classification Rules 589
s
Di D
splitINFO = −∑ log 2 i (4)
i =1 D D
C5.0 (called See 5 on Windows) is a commercial version of C4.5 now widely used in
many data mining packages such as Clementine and RuleQuest. It is targeted toward
use with large datasets. The decision tree induction is close to the C4.5, but the rule
generation is different. However, the precise algorithms used for C5.0 are not public.
One major improvement to the accuracy of C5.0 is based on boosting and it does
improve the accuracy. Results show that C5.0 improves on memory usage by about
90 percent, runs between 5.7 and 240 times faster than C4.5, the error rate has been
shown to be less than half of that found with C4.5 on some data sets, and produces
more accurate rules [3].
The basic algorithm of ITI follows the ID5R, but adds the ability to handle numeric
variable, instances with missing values, and inconsistent training instances, also han-
dle multiple classes, not just two. Updating the tree, ITI uses two steps: incorporating
an instance into the tree and restructuring the tree as necessary so that each decision
node contains the best test. When picking a best attribute it uses the Gain Ratio which
is described in C4.5. A table of frequency counts for each class and value combina-
tion is kept at a decision node and used for ensuring a best test at a decision node and
for tree revision.
Bagging and boosting are first approach they construct multiple base trees, each time
using a bootstrapped replicate of the original training data. Bagging [11] is a method
for generating multiple decision trees and using these trees to get an aggregated predic-
tor. The multiple decision trees are formed by bootstrap aggregating which repeatedly
samples from a data set and the sampling is done with replacement. It is that some
instances may appear several times in the same training set, while others may be omit-
ted from the training set. Unlike bagging, boosting [12] assigns a weight to each train-
ing example and may adaptively change the weight at the end of each boosting round.
However, bagging and boosting are difficult to be used in incremental induction
tree process because of the expensive cost and they have to manipulate the training
data. In CS4 algorithm [8], [13], instead of manipulating the training data, it keeps the
original data unchanged and but change the tree learning phase. It forces the top-
ranked features to be the root of tree and remain nodes are constructed as C4.5. This
is called tree cascading, and for classification, the algorithm combines those tree
committees and shares the rules in the committee in a weighted manner. Together, the
cascading idea guides the construction of tree committees while the sharing idea
aggregates the discrimination powers made by each individual decision tree.
the cascading and sharing ensemble method of CS4 to break the constraint of single-
ton classification rules by producing many significant rules from the committees of
decision trees and combine those rules discriminating power to accomplish the predic-
tion process. We call this algorithm as ICS4 and the main steps of the process could
be 3 as shown in three functions below:
incremental_update(node, training_example)
{
add_training_example_to_tree(node, training_example)
{ Add examples to tree using tree revision; }
ensure_best_test(node)
{ Ensure each node has desired test ; }
sign_class_label_for_test_example(test_example)
{
if there are test examples
for each kth top-ranked tests
//except the first best test
force the test to installed at root node
for remaining nodes
ensure_best_test(node);
from each constructed decision tree
Derive significant classification rules;
}
}
Here, KC denotes the number of rules in the class C, ruleCi denotes ith rules in class C,
if the score for one class C is larger than other classes, then the class label for the
instance T is assigned as C.
592 M. Piao et al.
Breast cancer is a cancer that starts in the cells of the breast in women and men.
Worldwide, breast cancer is the second most common type of cancer after lung cancer
and the fifth most common cause of cancer death. Breast cancer is about 100 times as
frequent among women as among men, but survival rates are equal in both sexes. In
this section, we report empirical behavior of the algorithm discussed above by using
the data source named Wisconsin Breast Cancer Dataset which is taken from the Uni-
versity of California at Irvine (UCI) machine learning repository [16]. This dataset
consists of 569 instances and have 32 features. Here, we only use 20 top-ranked
features to construct tree committees.
The mean, standard error, and “worst” or largest (mean of the three largest values)
of these features were computed and resulting in 30 features. For instance, field 3 is
Mean Radius, field 13 is Radius SE, and field 23 is Worst Radius.
In the following tables, for example, that 50 vs. 50 means that the example data set
is divided into 50% of training and 50% of test.
Comparison of accuracy
102
100 ITI ICS4 C5.0 rule C5.0
98
yc
ar 96
uc
c 94
A
92
90
88
TR50 ><TE50 TR70 ><TE30 TR80 ><TE20 TR100 ><TE100
Defferent size of training and test
At Figure 1, ITI and ICS4 are tested in incremental mode, and C5.0 is tested in
batch mode (non-incremental) for both rule based and tree based classifier. As shown
in above results, the ICS4 algorithms can achieve high performances on different
manipulation of the example data and with different types of decision tree learning
algorithms. It means that using the Cascading and Sharing method in incremental
induction tree to derive significant rules could provide competitive accuracy to incre-
mental induction algorithm and even non-incremental approaches when tested on
consistent size of example data. Consider the execution time and storage mechanism,
because it just constructs the tree committees on the point of beginning of test or there
are new unknown instances, so it can finish the work on acceptable time as ITI.
5 Conclusion
In this paper, base on well-accepted design goals we introduced an approach for dis-
covering many significant classification rules from incrementally induced decision
tree by using the Cascading and Sharing ensemble method. Tree induction offers a
highly practical method for generalizing from examples whose class label is known
and the average cost of constructing the tree committee is much lower than the cost of
594 M. Piao et al.
building a new decision tree committee. For testing the performance of ensembles
of incremental tree induction, we used the example data which is about diagnosis of
Wisconsin Breast Cancer Dataset. All 31 features are used without variable selection
and the threshold of the feature choice was given as 20 to construct the 20 number of
trees by forcing 20 top-ranked features iteratively as the root node of a new tree. The
first tree is constructed in incremental induction manner, and others are constructed
when there are instances that need to be assign a class label. The results show that this
new approach is suitable for the bio-medical research.
References
1. Quinlan, J.R.: Induction of Decision Trees. Machine Learning, 81–106 (1986)
2. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
(1993)
3. RuleQues Research Data Mining Tools, http://www.rulequest.com/
4. Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: Proceed-
ings of the Fifth National Conference on Artificial Intelligence, pp. 496–501 (1986)
5. Utgoff, P.E.: Incremental Induction of decision trees. Machine Learning, 161–186 (1989)
6. Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision Tree Induction Based on Efficient
Tree Restructuring. Machine Learning, 5–44 (1997)
7. Tan, P.N., Steinbach, M., Kumar, V.: Ensemble methods. In: Introduction to data mining,
pp. 278–280. Addison Wesley, Reading (2006)
8. Li, J.Y., Liu, H.A., Ng, S.-K., Wong, L.: See-Kiong Ng, Limsoon Wong: Discovery of
significant rules for classifying cancer diagnosis data. Bioinformatics 19, 93–102 (2003)
9. Utgoff, P.E.: Decision Tree Induction Based on Efficient Tree Restructuring. Technical re-
port, University of Massachusetts (1994)
10. Tan, P.N., Steinbach, M., Kumar, V.: Decision tree induction. In: Introduction to data min-
ing, pp. 150–172. Addison Wesley, Reading (2006)
11. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
12. Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: The Thir-
teenth International Conference on Machine Learning, pp. 148–156 (1996)
13. Li, J., Liu, H.: Ensembles of cascading trees. In: Third IEEE international conference on
data mining, pp. 585–588 (2003)
14. Lee, H.G., Noh, K.Y., Ryu, K.H.: Mining Biosignal Data: Coronary Artery Disease Diag-
nosis using Linear and Nonlinear Features of HRV. In: PAKDD 2007 Workshop, BioDM
2007. LNCS. Springer, Heidelberg (2007)
15. Ryu, K.H., Kim, W.S., Lee, H.G.: A Data Mining Approach and Framework of Intelligent
Diagnosis System for Coronary Artery Disease Prediction. In: The 4th Korea-Japan Int’l
Database Workshop 2008 (2008)
16. UCI Machine Learning Repository,
http://archive.ics.uci.edu/ml/datasets.html