Professional Documents
Culture Documents
Sargur Srihari
srihari@cedar.buffalo.edu
1
Machine Learning Srihari
Topics in CART
• CART as an adaptive basis function model
• Classification and Regression Tree Basics
• Growing a Tree
2
Machine Learning Srihari
A Classification Tree
PlayTennis Attributes
3
Machine Learning Srihari
4
Machine Learning Srihari
5
Machine Learning Srihari
Play Need 76
Tennis
Probabilities
(2)
Can we do better?
6
Machine Learning Srihari
7
Machine Learning Srihari
8
Machine Learning Srihari
9
Machine Learning Srihari
10
Machine Learning Srihari
Entropy
• Illustration:
• S is a collection of 14 examples with 9 positive and 5
negative examples
c
Entropy (S) ≡ ∑ −pi log 2 pi
i=1
12
Machine Learning Srihari
Information Gain
• Entropy measures the impurity of a collection
• Information Gain is defined in terms of Entropy
– expected reduction in entropy caused by
partitioning the examples according to this attribute
13
Machine Learning Srihari
Info[2,3]=entropy(2/5, 3/5)
= -2/5 log 2/5 - 3/5 log 3/5
= 0.97 bits
15
Machine Learning Srihari
Info(2,3)=0.97
Info(0,2)=info(1,0)=0 Info(3,3)=1 Info (3,2)= 0.97
Info(1,1)=0.5 Info(2,2)=Info(1,1)=1 Info (0,3)=Info(2,0)=0
Ave Info=0+0+(1/5) Gain(windy)=1-1 Gain(humidity)=0.97 bits
Gain(temp)=0.97- 0.2 =0.77 bits =0 bits
17
Machine Learning Srihari
18
Machine Learning Srihari
19
Machine Learning Srihari
20
Machine Learning Srihari
values v1, v2
and v3 of
{Attributes} - A
Attribute A
Label = - Label = +
21
Machine Learning Srihari
22
Machine Learning Srihari
• The best attribute is the one with the highest information gain, as defined in Equation:
23
Machine Learning Srihari
Perfect feature
• If feature outlook has two values: sunny and
rainy
• If for sunny all 5 values of playtennis are yes
• If for rainy all 9 values of playtennis are no
Gain(S,outlook) = 0.94 - (5/9).0 - (9/9).0
= 0.94
24
Machine Learning Srihari
25
Machine Learning Srihari
26
Machine Learning Srihari
Figure 3.3
27
Machine Learning Srihari
Humidity is a perfect feature since there is no uncertainty left when we know its value
28
Machine Learning Srihari
29
Machine Learning Srihari
Figure 3.5
30
Machine Learning Srihari
31
Machine Learning Srihari
32
Machine Learning Srihari
33
Machine Learning Srihari
34
Machine Learning Srihari
35
Machine Learning Srihari