You are on page 1of 73

Lecture 03

Decision Tree

Xizhao WANG
Big Data Institute
College of Computer Science
Shenzhen University

March 2021
Decision Tree Learning
Decision Tree Generation – An Illustration
Uncertainty
Inductive Bias and Partition Decision Tree
Summary
Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration
Uncertainty 1. A General Framework of Supervised Learning
Inductive Bias and Partition
Summary 2. Decision Tree Learning
Advanced Topics on Decision Trees

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration
Uncertainty 1. A General Framework of Supervised Learning
Inductive Bias and Partition
Summary 2. Decision Tree Learning
Advanced Topics on Decision Trees

A General Framework of Supervised Learning

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration
Uncertainty 1. A General Framework of Supervised Learning
Inductive Bias and Partition
Summary 2. Decision Tree Learning
Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

How to use this training set


to generate a decision tree?

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

The difference between Random-partition tree


and Attribute-induced tree

{D1 D3 D5 {D2 D4
D7 D9} D6 D8 {D11 D12
D10 D13} D14}

Decision Tree is a type of Attribute-induced


tree (with samples in a leaf – one class)
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Animation of The Generation

 The training data set contains 14 samples with 5 attributes.


 There is a special attribute: the attribute PlayTennis is the class label.
 Based on the training data set, we want to find a set of rules to know
how to determine a new sample would like to play tennis or not.
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

Animation of The Generation - Continued


{D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}

Outlook
Sunny Overcast Rain

{D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14}

Humidity Ye Wind
High Normal
s Strong Weak

{D1,D2,D8} {D9,D11} {D6,D14} {D4,D5,D10}

No Ye No Ye
s s
 D1, D2,…,D14 represent samples.
 Red and blue indicate the class label of a sample is “No” or “Yes”.
 Samples are split by the most suitable attribute.
 Assign the leaf node with a class label that most samples belong to.
Information gain / Gain ratio / Gini index…
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

For Real-Valued Attributes

If an attribute A is real-valued,
then the partition cannot be induced by “Attribute A =
value #1”

We often use the a constrict like


“Attribute A > value #1” (>=, <, <=, etc)
to induce a partition.

For instance:

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

For Real-Valued Attributes – Continued (1)

 The training data set contains 14 samples with 5 attributes.


 There is a special attribute: the attribute PlayTennis is the class label.
 The attributes, temperature, and humidity are numerical.
 Other attributes are categorical, that is, they cannot be ordered.
 Based on the training data set, we want to find a set of rules to know
how to determine a new sample would like to play tennis or not.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Difference Between Random-Partition Tree and Attribute-Induced Tree
Decision Tree Generation – An Illustration
Uncertainty 2. Difference Between Decision Tree and Other Type of Partition of Sample Space
Inductive Bias and Partition 3. Training Set to Generate A Decision Tree
Summary 4. Animation of The Generation
Advanced Topics on Decision Trees 5. For Real-Valued Attributes

For Real-Valued Attributes – Continued (2)


{D1,D2,D3,D4,D5,D6,D7,D8,D9,D10}

Outlook
Sunny Overcast Rain

{D1,D2,D3,D4} {D5,D6,D7} {D8,D9,D10}

Humidity Ye Temperature
s

{D2,D3} {D1,D4} {D9,D10} {D8}

Ye No No Ye
s s
 D1, D2,…,D14 represent samples.
 Red and blue indicate the class label of a sample is “No” or “Yes”.
 Samples are split by the most suitable attribute and corresponding value.
 Assign the leaf node with a class label that most samples belong to.
Information gain / Gain ratio / Gini index…
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
5. Non-Specificity
Summary 6. Rough-Degree
Advanced Topics on Decision Trees 7. Relation Between 2 Uncertainties

Summary of Uncertainty Definition

Uncertainty of an object with description

Shannon Probability Uncertainty caused by


entropy distribution randomness

Classificatio
Impurity of the class
n Crisp set
distribution in a set
entropy

Fuzziness Fuzzy set Uncertainty of a linguistic term

Non-specificity when choosing


Non-
Fuzzy set one from many available
specificity
choices.
Rough-
Rough set Upper / lower approximation
degree
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning 1. Summary of Uncertainty Definition
Decision Tree Generation – An Illustration 2. Shannon Entropy
Uncertainty 3. Classification Entropy
Inductive Bias and Partition 4. Fuzziness
Summary 5. Non-Specificity
Advanced Topics on Decision Trees 6. Rough-Degree
7. Relation Between 2 Uncertainties

Relation between 2 uncertainties: fuzziness & ambiguity


Fuzziness Fuzzy set A  {1 ,  2 } Ambiguity

Fig. a. Fuzziness with the Fig. b. Ambiguity with the


formula formula
.
1 2
Ev ( A)     i log 2 i  (1  i ) log 2 (1  i )  Ea  A  =1 /  2
2 i 1

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

The types of attributes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

An example (1)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

An example (2)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

An example (3)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Decision Tree is A Partition
Uncertainty 2. The types of Attributes
Inductive Bias and Partition
Summary 3. Example
Advanced Topics on Decision Trees 4. Inductive Bias

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. A Summary
Uncertainty
Inductive Bias and Partition 2. Uncertainty Based on Frequency in Symbolic Learning
Summary 3. Decision Tree – in Comparison with NNs
Advanced Topics on Decision Trees

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. A Summary
Uncertainty
Inductive Bias and Partition 2. Uncertainty Based on Frequency in Symbolic Learning
Summary 3. Decision Tree – in Comparison with NNs
Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. A Summary
Uncertainty
Inductive Bias and Partition 2. Uncertainty Based on Frequency in Symbolic Learning
Summary 3. Decision Tree – in Comparison with NNs
Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. A Summary
Uncertainty
Inductive Bias and Partition 2. Uncertainty Based on Frequency in Symbolic Learning
Summary 3. Decision Tree – in Comparison with NNs
Advanced Topics on Decision Trees

Decision Tree – in comparison with NNs, regarding image classification, the


accuracy of DTs is considerably lower.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Outline

1. Decision Tree Learning


2. Decision Tree Generation – An Illustration
3. Uncertainty
4. Inductive Bias and Partition
5. Summary
6. Advanced Topics on Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Further Discussions: Some Advanced Topics on Decision Trees

1. Splitting Criteria – Expanded Attribute Selection Criteria


2. Pruning Trees
3. Evaluation of Classification Trees
4. Fuzzy Decision Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Advanced topics on decision trees

Splitting Criteria –
Expanded Attribute Selection
Criteria

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (1)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (2)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (3)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (4)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (5)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Splitting Criteria – Expanded Attribute Selection Criteria (6)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Advanced topics on decision trees

Pruning Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (1)

Recursive algorithm
When do we stop?
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (2)


Stopping
Criteria
 All instances under each branch belong to the same category
 Each leave node contains one example
Overfitting Problem

 Can always classify


training examples perfectly

 Doesn’t work on new data

Number of nodes

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (3)

Stopping
Criteria

Pre-pruning

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (4)


Overview - Post-pruning
Employing tight stopping criteria tends to create small and underfitted deci
sion trees. On the other hand, using loose stopping criteria tends to generat
e large decision trees that are overfitted to the training set.

Post-pruning divides the generation of the decision tree into two phases. The first phase
is the tree-building process with the termination condition that the proportion of a
certain class in the node reaches 100%, and the second phase is pruning the tree
structure that is acquired from the first phase.
In this way, post-pruning approaches avoid the problem of a limited visual field.
Accordingly, the accuracy of post-pruning methods is typically superior to that of pre-
pruning methods, and post-pruning methods are more commonly used than pre-pruning
methods.
There are various post-pruning techniques for decision trees. Most perform t
op-down or bottom-up traversal of the nodes. A node is pruned if this opera
tion improves a certain criterion. The following subsections describe the mos
t popular technique.
Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree
Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (5)

Cost Complexity Pruning


Breiman et al. (1984) developed a pruning methodology based on a loose st
opping criterion and allowing the decision tree to overfit the training set. Th
en the overfitted tree is cut back into a smaller tree by removing sub-branch
es that are not contributing to the generalization accuracy.

Cost complexity pruning proceeds in two stages.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (6)

Cost Complexity PruningKey points


-

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Pruning Trees (7)

Cost Complexity PruningKey points


-
Study [1] has shown that the relationship between the tim
e required to generate the sub-tree sequence and the num
ber of non-leaf nodes in the original decision tree is quadr
atic, which means that if the number of non-leaf nodes inc
reases linearly with the number of training examples, then
the relationship between the time complexity of the CCP
method and the number of training data is quadratic.

[1] Nobel A (2002) Analysis of a complexity-based pruning scheme for classification


trees. IEEE Trans Inf Theory 48(8):2362–2368

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Advanced topics on decision trees

Evaluation of
Classification
Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (1)


Generalization Error

 Classification accuracy is the primary evaluation criterion


 Its actual value is known only in rare cases (mainly synthetic cases)
 One can take the training error as an estimation of the generalization error.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (2)

1. Theoretical Estimation of Generalization Error


2. Empirical Estimation of Generalization Error
3. Alternatives to the Accuracy Measure

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (3)


4. Confusion Matrix

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (4)


5. ROC curves

Another measure is the ROC curves which illustrate the tradeoff


between true positive to false positive rates [Provost and Fawcet
t (1998)]. Figure 4.3 illustrates a ROC curve in which the X-axis re
presents a false positive rate and the Y -axis represents a true po
sitive rate. The ideal point on the ROC curve would be (0,100), th
at is, all positive examples are classified correctly and no negativ
e examples are misclassified as positive.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (5)


6. Computational Complexity
• Computational complexity for generating a new classifier
• Computational complexity for updating a classifier
• Computational complexity for classifying a new instance

7. Comprehensibility
Comprehensibility criterion (also known as interpretability) refers to how well
humans grasp the induced classifier. While the generalization error measures how
the classifier fits the data, comprehensibility measures the “mental fit” of that
classifier.
8. Scalability to Large Datasets
Scalability refers to the ability of the method to construct the classificatio
n model efficiently given large amounts of data. Classical induction algorit
hms have been applied with practical success in many relatively simple an
d small-scale problems. However, trying to discover knowledge in real life
and large databases introduces time and memory problems.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (6)

9. Robustness
The ability of the model to handle noise or data with missing values and m
ake correct predictions is called robustness. Different decision trees algori
thms have different robustness levels. In order to estimate the robustness
of a classification tree, it is common to train the tree on a clean training se
t and then train a different tree on a noisy training set. The noisy training s
et is usually the clean training set to which some artificial noisy instances
have been added. The robustness level is measured as the difference in th
e accuracy of these two situations.
10. Stability
Formally, stability of a classification algorithm is defined as the degree to
which an algorithm generates repeatable results, given different batches
of data from the same process. In mathematical terms, stability is the exp
ected agreement between two models on a random sample of the original
data, where agreement on a specific example means that both models ass
ign it to the same class.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (7)

10. Stability
Formally, stability of a classification algorithm is defined as the degree to
which an algorithm generates repeatable results, given different batches
of data from the same process. In mathematical terms, stability is the exp
ected agreement between two models on a random sample of the original
data, where agreement on a specific example means that both models ass
ign it to the same class.

11. Over-fitting and Under-fitting


In decision trees, overfitting usually occurs when the tree has too many nodes relative
to the amount of training data available. By increasing the number of nodes, the
training error usually decreases while at some point the generalization error
becomes worse.

Overfitting is generally recognized to be a violation of the principle of Occams razor

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (8)


11. Over-fitting and Under-fitting (continued)
In decision trees there are two mechanisms that help to avoid overfitting. The first is
to avoid splitting the tree if the split is not useful (for instance by approving only
statistically significant splits). The second approach is to use pruning; after
growing the tree, we prune unnecessary nodes.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Evaluation of classifications tree (8)


12. “No Free Lunch” Theorem

The “conservation law” [Schaffer (1994)] or “no free lunch theorem” [Wolpert
(1996)]: if one inducer is better than another in some domains, then there are
necessarily other domains in which this relationship is reversed.
The “no free lunch theorem” implies that for a given problem, a certain approach
can yield more information from the same data than other approaches.
The “no free lunch” concept presents a dilemma to the analyst appr
oaching a new task: Which inducer should be used?

If the analyst is looking for accuracy only, one solution is to try each o
ne in turn, and by estimating the generalization error, to choose the o
ne that appears to perform best [Schaffer (1994)]. Another approach,
known as multistrategy learning [Michalski and Tecuci (1994)], attemp
ts to combine two or more different paradigms in a single algorithm.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Advanced topics on decision trees

Fuzzy Decision
Trees

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Fuzzy Representation of Data (Training Set –An Illustration)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Fuzzy Representation of Data (Training Set –An Illustration)

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

Fuzzy Decision Tree Induction

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree


Decision Tree Learning
Decision Tree Generation – An Illustration 1. Splitting Criteria – Expanded Attribute Selection Criteria
Uncertainty 2. Pruning Trees
Inductive Bias and Partition
Summary 3. Evaluation of Classification Trees
Advanced Topics on Decision Trees 4. Fuzzy Decision Trees

思考题
Problems (Exercises)

Problems (Assignments) - Decision Tree

1.Given a training set with 5 binary attributes (4 conditional attributes and 1


decision attribute), and 20 samples. Please give a rough estimation of the
numbers of random partition trees and attribute-induced trees.
2.Whether or not the minimum-entropy-based approach can generate a
decision tree with smallest scale?
3.How do you think the tree size if your use a random strategy to select the
induced-attribute in the process of sub-node generation?
4.Give a brief summary about the heuristics of choosing expanded attributes
while splitting a node during the process of decision tree generation.

Machine Learning Lecture – Xizhao Wang Lecture 02: Decision Tree

You might also like