Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison

Motivation Consideration of Existing Solutions New Solution: NBTree Summary
NBTree: A Naive Bayes/Decision-Tree Hybrid

Darin Morrison
April 17, 2007
Darin Morrison
Outline
1
Motivation Problem and Solutions Consideration of Existing Solutions Naive-Bayes Classiers Decision-Trees Learning Curves New Solution: NBTree Denition of NBTree Performance Summary
Darin Morrison
Problem and Solutions
Outline
1
Darin Morrison
Automatic Induction of Classiers
Problem How can we generate a classier from an arbitrarily sized database of labeled instances, where attributes are not necessarily independent? Solutions
1 2 3
Naive-Bayes Classiers Decision-Trees (C4.5) ?
Darin Morrison
1 2 3
Darin Morrison
1 2 3
Darin Morrison
1 2 3
Darin Morrison
Naive-Bayes Classiers Decision-Trees Learning Curves
Outline
1
Darin Morrison
Naive-Bayes Classiers
Pros
1 2 3 4
Fast Induced classiers are easy to interpret Robust to irrelevant attributes Uses evidence from many attributes Assumes independence of attributes Low performance ceiling on large databases
Cons
1 2
Darin Morrison
Outline
1
Darin Morrison
Decision-Trees
Pros
1 2
Fast Segmentation of data Fragmentation as number of splits becomes large Interpretability goes down as number of splits increase
Cons
1 2
Darin Morrison
Outline
1
Darin Morrison
Comparison between NB and C4.5 Learning Curves
Error bars represent 95% condence intervals on accuracy

Darin Morrison NBTree: A Naive Bayes/Decision-Tree Hybrid
Denition of NBTree Performance
Outline
1
Darin Morrison
Algorithm
dom(makeNBTree) cod(makeNBTree)
1
Set LabeledInstance Tree Split NBC
2 3
For each attribute Xi , evaluate the utility, u(Xi ), of a split on attribute Xi . For continuous attributes, a threshold is also found at this stage. Let j = argmaxi (ui ), i.e., the attribute with the highest utility. If uj is not signicantly better than the utility of the current node, create a Naive-Bayes classier for the current node and return. Partition the set of instances T according to the test on Xj . If Xj is continuous, a threshold split is used; if Xj is discrete, a multi-way split is made for all possible values. For each child, call the algorithm recursively on the portion of T that matches the test leading to the child.
Algorithm
1
2 3
Algorithm
1
2 3
Algorithm
1
2 3
Algorithm
1
2 3
Algorithm
1
2 3
Utility
Utility of Node Computed by discretizing the data and computing 5-fold cross-validation accuracy estimate of using NBC at node Utility of Split Computed by weighted sum of utility of nodes, where weight given to node is proportional to num of instances that reach node Signicance Split is signicant i the relative reduction in error is greater than 5% and there are at least 30 instances in node Intuition Attempt to approximate whether generalization accuracy for NBC at each leaf is higher than single NBC at current node
Outline
1
Darin Morrison
Graphs
Darin Morrison
Graphs
Darin Morrison
Statistics
Average Accuracy C4.5 Naive-Bayes NBTree Number of Nodes per Tree letter adult DNA led24 C4.5 2109 2213 131 49 NBTree 251 137 3 1 81.91% 81.69% 84.47%
Darin Morrison
Summary NBTree appears to be a viable approach to inducing classiers, where:

Many attributes are relevant for classication Attributes are not necessarily independent Database is large Interpretability of classier is important
In practice, NBTrees are shown to scale to large databases and, in general, outperform Decision Trees and NBCs alone
Darin Morrison
Appendix
For Further Reading
References
R. Kohavi. Scaling Up the Accuracy of Naive-Bayes Classiers: a Decision-Tree Hybrid Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996.
Darin Morrison

Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison

Uploaded by

Copyright:

Available Formats

Motivation Consideration of Existing Solutions New Solution: NBTree Summary