Professional Documents
Culture Documents
5.2) Decision Tree Induction
5.2) Decision Tree Induction
Decision Trees
Classification Learning: Definition
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Examples of Classification Task
Test Data
Start at the root of tree Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Learning: ID3
3) Choose Debt Level as root of subtree. 3a) All examples are in the same class, MODERATE.
Return Leaf node.
Debt Level
Debt Level
Low High High
Low
3a) 3b) 3b)
3 2,12,14 MODERATE 2,12,14
ID3 Example (II)
4) Choose Credit History as root of subtree. 4a-4c) All examples are in the same class.
Return Leaf nodes.
Credit History
Credit History
Unknown Good
Bad Unknown Good
4a) 4b) 4c) Bad
5,6 8 9,10,13
LOW MODERATE LOW
ID3 Example (III)
Income
Low
Medium High
HIGH Debt Level Credit History
Low High Unknown Good
Bad
Unknown Good
Bad
A2
R
G A2
R
A1
M F G A2
B
Decision surfaces are axis-
aligned Hyper-rectangles A1
M F
Non-Uniqueness
No
MarSt Single,
1 Yes Single 125K
Married Divorced
2 No Married 100K No
3 No Single 70K No NO Refund
4 Yes Married 120K No Yes No
5 No Divorced 95K Yes
NO TaxInc
6 No Married 60K No
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10
ID3’s Question
More precisely:
Given a training set, which of all of the decision trees
consistent with that training set has the greatest
likelihood of correctly classifying unseen instances of
the population?
ID3’s (Approximate) Bias
Practically:
Given a training set, how do we select attributes so that
the resulting tree is as small as possible, i.e. contains as
few attributes as possible?
Not All Attributes Are Created Equal
l Think chemistry/physics
– Entropy is measure of disorder or homogeneity
– Minimum (0.0) when homogeneous / perfect order
– Maximum (1.0, in general log C) when most
heterogeneous / complete chaos
l In ID3
– Minimum (0.0) when all records belong to one class,
implying most information
– Maximum (log C) when records are equally distributed
among all classes, implying least information
– Intuitively, the smaller the entropy the purer the partition
Examples of Computing Entropy
Entropy (t ) = − p( j | t ) log p( j | t )
j 2
l Information Gain:
GAIN n
= Entropy ( p ) − Entropy (i )
k
i
n
split i =1
A? B?
Yes No Yes No
E1 E2 E3 E4
E12 E34
Gain = E0 – E12 vs. E0 – E34
Gain Ratio
l Gain Ratio:
GAIN n n
GainRATIO = SplitINFO = − log
Split k
i i
split
SplitINFO n n i =1
GINI (t ) = 1 − [ p( j | t )]
k
ni
= GINI (i )
2
GINI split
i =1 n
j
CarType
Family Luxury
Sports
CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}
Overfitting:
– Given a model space H, a specific model hH is
said to overfit the training data if there exists
some alternative model h’H, such that h has
smaller error than h’ over the training examples,
but h’ has smaller error than h over the entire
distribution of instances
Underfitting:
– The model is too simple, so that both training
and test errors are large
Detecting Overfitting
Overfitting
Overfitting in Decision Tree Learning
l Post-pruning
– Split dataset into training and validation sets
– Grow full decision tree on training set
– While the accuracy on the validation set
increases:
◆Evaluate the impact of pruning each subtree, replacing
its root by a leaf labeled with the majority class for that
subtree
◆Replace subtree that most increases validation set
accuracy (greedy approach)
Decision Tree Based Classification
l Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Good accuracy (careful: NFL)
l Disadvantages:
– Axis-parallel decision boundaries
– Redundancy
– Need data to fit in memory
– Need to retrain with new data
Oblique Decision Trees
x+y<1
Class = + Class =
Q R
S 0 Q 1
0 1 S 0
0 1