Professional Documents
Culture Documents
Chapter 7
Chapter 7
M L
D M
Machine Learning
& Data Mining
Chapter 7
Decision Trees
#1: C4.5
#2: K-Means
#3: SVM
#4: Apriori
#5: EM
#6: PageRank
#7: AdaBoost
#7: kNN
#7: Naive Bayes
#10: CART
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2
Content MIMA
Introduction
CLS
ID3
C4.5
CART
Model Prediction
Examples
Generalize
Instantiated for
another case
Learning Algorithms:
CLS (Concept Learning System)
ID3 C4 C4.5 C5
blond black
Entropy ( S ) 0.5
0
0 0.5 1
p1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 17
ID3 MIMA
Entropy ( S ) 0.5
0
0 0.5 1
| p1 p2 |
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18
ID3 MIMA
+ + + + # + = 14
+ + +
+ + + +
+ + + #–=1
p 14 /15
p 1/15
Entropy 0.353359
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 19
Information Gain MIMA
Attribute A {v1 ,K , vn }
S
v1 v2 vn
S1 S2 Sn
| Sv |
Gain( S , A) Entropy ( S ) Entropy ( Sv )
vA | S |
| Sv |
Gain( S , A) Entropy ( S ) Entropy ( Sv )
vA | S |
Gain( S , Height ) 0
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 21
Information Gain MIMA
| Sv |
Gain( S , A) Entropy ( S ) Entropy ( Sv )
vA | S |
| Sv |
Gain( S , A) Entropy ( S ) Entropy ( Sv )
vA | S |
+
+ +
+ ++ + ++ + + –
+ – – ++
++ +– –++ + +
– + + +– + – – +
+
+ + –++ + –+– – –
+ + + –+ + –+ – + – –
+
– +
– –+ + – ++ + – –
+ +
–+ – + –
– + + – – + – + +
–
+ – –
+
+ – + – – +
+ –
++ –+ +–
–+– + – +
+ – +– +
+
– + –+ – +
– –
Rule post-pruning
Used by C4.5
Example:
Temprature<54 Temprature54-85 Temprature>85
Gain 2 ( S , A)
Cost ( S , A)
Nunez (1988)
(L , x, L )
(L , x, L ) Attribute A={x, y, z}
(L , x, L )
(L , y ,L )
(L , y ,L ) Gain( S , A) ?
(L , y ,L )
S= (L , y ,L )
(L , z ,L )
(L , z ,L ) Possible Approaches:
(L , z ,L )
(L , ?,L ) Assign most common value of A to the
(L , ?,L ) unknown one.
Assign most common value of A with the
same target value to the unknown one.
Assign probability to each possible value.
Any Question?