Professional Documents
Culture Documents
1
DECISION TREES
2
IDENTIFICATION TREES
3
Definition
7
IDENTIFICATION TREES
8
IDENTIFICATION TREES
Blonde Brown
Red
No Yes
No Sun burn
∙Sarah Dana
∙Annie Katie
10
IDENTIFICATION TREES
11
IDENTIFICATION TREES
Example:
If hair is blonde and person uses lotion then no sunburn
If we eliminate the 1st antecedent, and check the rule over the
whole database, we find that there are no misclassifications
Hence we can drop this antecedent as unnecessary
12
IDENTIFICATION TREES
Example:
If hair is blonde and person uses lotion then no sunburn
If we eliminate the 1st antecedent, and check the rule over the
whole database, we find that there are no misclassifications
Hence we can drop this antecedent as unnecessary
R1 = Sunburn R2 = No Sunburn
Lotion = Yes l=0 m=3
13
IDENTIFICATION TREES
Example:
If hair is blonde and person uses lotion then no sunburn
R1 = Sunburn R2 = No Sunburn
Blonde = Yes l=2 m=2
14
IDENTIFICATION TREES
R1 = Canary R2 = Crow
Color = Yellow l = 1000 m=0
15
IDENTIFICATION TREES
However if we have
R1 = Canary R2 = Crow
Color = Yellow l = 999 m=1
we observe that the rule is not consistent with the data and
some more antecedents should be incorporated (by decision
tree expansion) to make the rule valid for the whole data set
17
IDENTIFICATION TREES
18
IDENTIFICATION TREES
R1 = Canary R2 = Crow
Color = Yellow l = 1000 m=0
Color = Black n=0 o = 1000
19
IDENTIFICATION TREES
20
IDENTIFICATION TREES
R1 = Canary R2 = Crow
P1 = Yellow l = 999 m=0
P2 = Black n=0 o=1
21
IDENTIFICATION TREES
22
IDENTIFICATION TREES
Identification Trees
Identification Trees
24
ID3 Example
Name Hair Height Weight Lotion Result
25
Average Disorder
■ Average Disorder =
◻ ∑ Nb / Nt * (∑ - Nbc / Nb log2 Nbc / Nb)
◻ Where Nb is the number of samples in
branch b,
◻ Nt is the total number of samples in all
branches,
◻ Nbc is the total samples in branch b of
class c
26
ID3 Example (cont.)
Hair Blonde 4
Brown 3
Red 1
27
ID3 Example (cont.)
■ Blonde = 4/8 (-2/4 log2 2/4 -2/4 log2 2/4)
= 4/8 (0.5 + 0.5)
= 0.5
■ Brown = 3/8 (-3/3 log2 3/3)
= 3/8 (-1 log2 1)
=0
■ Red = 1/8 (-1 log2 1)
=0
28
ID3 Example (cont.)
■Average Disorder (Hair) = Blonde + Brown
+ Red
= 0.5 + 0 + 0
= 0.5
Average Disorder (Hair) = 0.5
29
ID3 Example (cont.)
■ Similarly Average Disorder for other
attributes can be calculated; which turns
out to be
■ Average Disorder (Hair) = 0.5
■ Average Disorder (Height) = 0.6886
■ Average Disorder (Weight) = 0.9386
■ Average Disorder (Lotion)= 0.6067
30
ID3 Example (cont.)
■ Most homogeneous attribute is Hair so put
hair as the first test. Tree will be:
Hair
Blonde Brown
Red
31
ID3 Example (cont.)
■ With red and brown hair color all the training set is
completely classified. So the only problem left is
with blonde hair color.
Attribute Name Attribute Values Attribute Occurrences
Average 1
Short 2
32
ID3 Example (cont.)
■ Tall = 1/4 (-1 log2 1)
=0
■ Average = 1/4 (-1 log2 1)
=0
■ Short = 2/4 (-1/2 log2 1/2 -1/2 log2 1/2)
= 2/4 (0.5 + 0.5)
= 0.5
■ Average Disorder (Height with “hair =
33
blonde”) = 0 + 0 + 0.5 = 0.5
ID3 Example (cont.)
■ Similarly for other attributes but with hair =
blonde the average disorder is:
■ Average Disorder (Height with “hair =
blonde”) = 0.5
■ Average Disorder (Weight with “hair =
blonde”) = 1
■ Average Disorder (Lotion with “hair =
blonde”) = 0
34
ID3 Example (cont.)
■ Here the lotion is with the minimum
average disorder so it will be the nest test.
Now the tree will become: Hair
Blonde Brown
Red
Lotion Used
No Yes
35
ID3 Example (cont.)
Hair
Blonde Brown
Red
No Yes
No Sun burn
∙Sarah Dana
∙Annie Katie
36
ID3 Example (cont.)
■ Rules Extraction
■ IF the person’s hair color is blonde
■ The person uses lotion
■ THEN no sunburn
38
IDENTIFICATION TREES
Evaluating ID3
Bad Data
Data is bad if one set of attributes has two different
outcomes
Missing Data
Data is missing if an attribute is not present, perhaps
because it was too expensive to obtain
39
Entropy
■ Attributes of instances
◻ Outlook = {rainy (r), overcast (o), sunny (s)}
◻ Temperature = {cool (c), medium (m), hot
(h)}
◻ Humidity = {normal (n), high (h)}
◻ Wind = {weak (w), strong (s)}
■ Class value
◻ Play Tennis? = {don’t play (n), play (y)}
■ Feature = attribute with one value
◻
46
E.g., outlook = sunny
Decision Tree Representation
Good day for tennis?
Leaves = classification
Arcs = choice of value
Outlook
for parent attribute Sunny Rain
Overcast
Humidity Wind
Play
Strong Weak
High Normal
48
DT Learning as Search
■ Nodes
Decision Trees
■ Operators Tree Refinement: Sprouting the tree
Smallest tree possible: a single leaf
■ Initial node Information Gain
Best tree possible (???)
■ Heuristic?
■ Goal?
Day Outlook Temp Humid Wind Play?
d1 s h h w n
What is the d2 s h h s n
d3 o h h w y
Simplest Tree? d4 r m h w y
d5 r c n w y
d6 r c n s n
d7 o c n s y
d8 s m h w n
d9 s c n w y
d10 r m n w y
d11 s m n s y
d12 o m h s y
d13 o h n w y
How good? d14 r m h s n
Majority class:
[9+, correct on 9
examples
5-]
incorrect on 5
examples
50
Successors Ye
s
Humid Wind
Outlook Temp
Good
52
Entropy (disorder) is bad
Homogeneity is good
■ Let S be a set of examples
■ Entropy(S) = -P log2(P) - N log2(N)
◻ P is proportion of pos example
◻ N is proportion of neg examples
◻ 0 log 0 == 0
■ Example: S has 9 pos and 5 neg
Entropy([9+, 5-]) = -(9/14) log2(9/14) -
(5/14)log2(5/14)
53
= 0.940
Information Gain
■ Measure of expected reduction in
entropy
■ Resulting from splitting along an
attribute
v∈
Values(A)
Where
54
Entropy(S) = -P log2(P) - N log2(N)
Tree Induction Example
■ Entropy of data S
Info(S) = -9/14(log2(9/14))-5/14(log2(5/14)) = 0.94
Parent Node, p is split into
k partitions;
■ Split data by attribute Outlook ni is number of records in
S[9+, 5-] Sunny [2+,3-] partition i
Outlook Overcast
Rain [3+,2-]
[4+,0-]
Gain(Outlook) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))]
– 4/14[-4/4(log2(4/4))-0/4(log2(0/4))]
– 5/14[-3/5(log2(3/5))-2/5(log2(2/5))]
= 0.94 – 0.69 = 0.25
55
Tree Induction Example
56
Tree Induction Example
57
Tree Induction Example
Outlook Tempera Humidity Wind Play
ture Tennis
Sunny >25 High Weak No
Sunny >25 High Strong No
Gain(Outlook) = 0.25
Gain(Temperature)=0.0
Overcast >25 High Weak Yes
3
Rain 15-25 High Weak Yes Gain(Humidity) = 0.15
Rain <15 Normal Weak Yes Gain(Wind) = 0.05
Rain <15 Normal Strong No
Overcast <15 Normal Strong Yes
Outlook
Sunny 15-25 High Weak No
Sunny <15 Normal Weak Yes
Rain 15-25 Normal Weak Yes
Sunny Overcast Rain
Sunny 15-25 Normal Strong Yes
Overcast 15-25 High Strong Yes
?? Ye ??
Overcast >25 Normal Weak Yes s
Rain 15-25 High Strong No
58
■ Entropy of branch Sunny
Info(Sunny) = -2/5(log2(2/5))-3/5(log2(3/5)) = 0.97
Outlook
Humidity Yes ??
High Normal
No Yes
60
■ Info(Rain)
Entropy =of-3/5(log
branch Rain
2(3/5))-2/5(log
2
(2/5)) =
0.97
61
Issues
■ Missing data
■ Real-valued attributes
■ Many-valued features
■ Evaluation
■ Overfitting
62
Strengths