Professional Documents
Culture Documents
3
Illustrating Supervised Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
Test Set
ID3:Entropy Entropy(S) = -p+ log2 p+ - p- log2 p-
-+-+
-+-+
---- +++
---- +++
6
Top-Down Induction of Decision Trees
-ID3 – A greedy BFS method
9
Converting a Tree to Rules
Outlook
11
Entropy(S[29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64
= 0.99
[29+,35-] A1=?
G H
12
Temperature
S=[9+,5-]
E(S)=0.940
Temperature
13
Information Gain
14
Selecting the Next Attribute
S=[9+,5-] S=[9+,5-]
E(S)=0.940 E(S)=0.940
Humidity Wind
Over
Sunny Rain
cast
17
R
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]
No Yes No Yes
Arguments in favor:
• Fewer short hypotheses than long hypotheses
• A short hypothesis that fits the data is unlikely to be a coincidence
• A long hypothesis that fits the data might be a coincidence
Arguments opposed:
• There are many ways to define small sets of hypotheses
21