Professional Documents
Culture Documents
4 y < 1,5
Tamás Horváth
yes no
3
University of Bonn and Fraunhofer IAIS
2 x < 3,5
1 yes no
y < 3,5
yes no
Decision Tree Learning
Error Measures
ID3 [Quinlan 79], C4.5, C5.0 [Quinlan 93,99], J48 [Witten et.al. 00]
ID3 - no numeric attributes
C4.5 fully developed, most popular and widely used
successor C5.0, ensembles
CART [Breiman et.al. 84]: regression trees
Outlook
No Yes No Yes
© T.Horváth – ILAS Machine Learning, WS16/17 8
Decision Tree Learning
Example (Kirsten/Wrobel/Dahmen/98 ):
Predicting Suitability of Locations for a Certain Plant Species
Location: Anderer Berg, Nauheim
Coordinates: L4, Blatt 98
Humidity of Soil: hoch
Acidity: neutral
Average Temperature: 9,2° C
. .
. . several hundreds of
. . thousands of such
recordings have been made
Species Found: Rotbuche,
– ca. 14.000 were used in
Gewöhnliche Esche,
the study
Waldgeißblatt
...
A Decision Tree
Humidity
= trocken = feucht
Acidity Temp
= low = high
9 >9
= neutral
Temp Temp G N G
3,5 > 3,5
7,5 > 7,5
N G G N
Outline
TDIDT algorithm: example (Quinlan, 1986)
information gain
overfitting
summary
Remark
Outlook
Yes
T F T F
Algorithm: Example
(9+,5-)
Outlook
Algorithm: Example
(9+,5-)
Temperature
Algorithm: Example
(9+,5-)
Humidity
High Normal
(3+,4-) (6+,1-)
Algorithm: Example
(9+,5-)
Wind
Weak Strong
(6+,2-) (3+,3-)
Algorithm: Example
Outlook
Algorithm: Example
Outlook
Algorithm: Example
Outlook
High Normal
(0+,3-) (2+,0-)
Algorithm: Example
Outlook
No Yes Yes No
Outline
TDIDT algorithm: example (Quinlan, 1986)
information gain
expressivity of decision trees
complexity of constructing optimal decision trees
overfitting
summary
Information Gain
Alfréd Rényi
1921 – 1970
Claude E. Shannon
1916 – 2001
Entropy: Example