Professional Documents
Culture Documents
Learning With Identification Trees: Artificial Intelligence CMSC 25000 February 7, 2002
Learning With Identification Trees: Artificial Intelligence CMSC 25000 February 7, 2002
Artificial Intelligence
CMSC 25000
February 7, 2002
Agenda
• Midterm results
• Learning from examples
– Nearest Neighbor reminder
– Identification Trees:
• Basic characteristics
• Sunburn example
• From trees to rules
• Learning by minimizing heterogeneity
• Analysis: Pros & Cons
Midterm Results
Hair Color
Blonde Brown
Red
Lotion Used Emily: Burn Alex: None
No Yes John: None
Pete: None
Sarah: Burn Katie: None
Annie: Burn Dana: None
Simplicity
• Occam’s Razor:
– Simplest explanation that covers the data is best
• Occam’s Razor for ID trees:
– Smallest tree consistent with samples will be best
predictor for new data
• Problem:
– Finding all trees & finding smallest: Expensive!
• Solution:
– Greedily build a small tree
Building ID Trees
• Goal: Build a small tree such that all
samples at leaves have same class
• Greedy solution:
– At each node, pick test such that branches are
closest to having same class
• Split into subsets with least “disorder”
– (Disorder ~ Entropy)
– Find test that minimizes disorder
Minimizing Disorder
Hair Color Height
Blonde Brown Tall
Short
Red Average
Sarah: B Alex: N Alex:N Sarah:B Dana:N
Emily: B Emily:B
Dana: N Pete: N Annie:B Pete:N
Annie: B John: N Katie:N John:N
Katie: N
Lotion
Weight No Yes
Light Heavy
Average Sarah:B Dana:N
Sarah:B Dana:N Emily:B Annie:B Alex:N
Katie:N Alex:N Pete:N Emily:B Katie:N
Annie:B John:N Pete:N
John:N
Minimizing Disorder
Height
Short Tall
Average
Annie:B Sarah:B Dana:N
Katie:N
Lotion
Weight No Yes
Light Heavy
Average Sarah:B Dana:N
Sarah:B Dana:N Annie:B Katie:N
Katie:N Annie:B
Measuring Disorder
• Problem:
– In general, tests on large DB’s don’t yield
homogeneous subsets
• Solution:
– General information theoretic measure of disorder
– Desired features:
• Homogeneous set: least disorder = 0
• Even split: most disorder = 1
Measuring Entropy
• If split m objects into 2 bins size m1 & m2,
what is the entropy?
1.2
0.8
mi m Disorder
i m
log 2 i
m
0.6
0.4
m1 m1 m2 m2
log 2 log 2 0.2
m m m m 0
0 0.2 0.4 0.6 0.8 1 1.2
m1/m
Measuring Disorder
Entropy
pi mi / m the probability of being in bin i
0 pi 1 p
i
i 1
p log
i
i 2 pi Entropy (disorder) of a split
Assume 0 log 2 0 0
p1 p2 Entropy
1 0 -1log21 - 0log20 = 0 - 0 = 0
½ ½ -½ log2½ - ½ log2½ = ½ +½ = 1
¼ ¾ -¼ log2¼ - ¾ log2¾ = 0.5 + 0.311 = 0.811
Computing Disorder
N instances
Branch1 Branch 2
N1 a N2 a
N1 b N2 b
ni
k
ni ,c ni ,c
AvgDisorde r log 2
i 1 nt cclass ni ni
Fraction of samples Disorder of class
down branch i distribution on branch i
Entropy in Sunburn Example
nik
ni ,c ni ,c
AvgDisorde r log 2
i 1 nt cclass ni ni