Professional Documents
Culture Documents
2
Impurity
3
Entropy: a common way to measure impurity
• Entropy = ∑− p
i
i log 2 pi
6
Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]
13 1 3 4 4
child − ⋅ lo g2 − ⋅ lo g2 = 0.7 8 7
entropy 1 7 1 7 1 7 1 7
child 1 1 1 2 1 2
entropy − 1 3⋅ lo g2 1 3 − 1 3 ⋅ lo g2 1 3 = 0.3 9 1
17 13
(Weighted) Average Entropy of Children = ⋅ 0 . 787 + ⋅ 0 . 391 = 0.615
30 30
Information Gain= 0.996 - 0.615 = 0.38 for this split 7
Entropy-Based Automatic
Decision Tree Construction
8
Using Information Gain to Construct a
Decision Tree 1
Choose the attribute A
Full Training Set S Attribute A with highest information
gain for the full training
2 v1 v2 vk set at the root of the tree.
Construct child nodes
for each value of A. Set S ′ S′={s∈S | value(A)=v1}
Each has an associated
subset of vectors in 3
which A has a particular repeat
value. recursively
till when?
9
Simple Example
Training Set: 3 features and 2 classes
X Y Z C
1 1 1 I
1 1 0 I
0 0 1 II
1 0 0 II
10
X Y Z C
1 1 1 I
1 1 0 I
0 0 1 II
1 0 0 II
Split on attribute X
If X is the best attribute,
this node would be further split.
II
X=1
II Echild1= -(1/3)log2(1/3)-(2/3)log2(2/3)
I I = .5284 + .39
II II = .9184
X=0 II
Echild2= 0
Eparent= 1
GAIN = 1 – ( 3/4)(.9184) – (1/4)(0) = .3112 11
X Y Z C
1 1 1 I
1 1 0 I
0 0 1 II
1 0 0 II
Split on attribute Y
I I
Y=1
Echild1= 0
I I
II II
X=0 II
II Echild2= 0
Eparent= 1
GAIN = 1 –(1/2) 0 – (1/2)0 = 1; BEST ONE 12
X Y Z C
1 1 1 I
1 1 0 I
0 0 1 II
1 0 0 II
Split on attribute Z
I
Z=1
II Echild1= 1
I I
II II
Z=0 I
II Echild2= 1
Eparent= 1
GAIN = 1 – ( 1/2)(1) – (1/2)(1) = 0 ie. NO GAIN; WORST 13