Professional Documents
Culture Documents
Chapter 09 CART - N
Chapter 09 CART - N
2
Introduction
• The basic principle:
1. Classify or predict an outcome based on a set of predictors
2. The output is a set of rules
3. Also called CART*, Decision Trees, Binary Trees, or just Trees
• Characteristics
Data-driven, not model-driven
Make no assumptions about the data. No normalization
Performs well across a wide range of situations without much
effort from the analyst
CART applied to both types of data (numeric and categorical
predictors)
Rules are represented by tree diagrams
Very popular because it provides easily understandable decision
rules (at least if the trees are not too large).
3
Example 1: Bank Customer Classification
Using CART to classify bank customers who receive a loan offer as either acceptors or nonacceptor, based
on information such as their income, education level, and average credit card expenditure.
4
Tree Structure
5
Example 1: Decision Rules and classifying a
New Record
6
Recursive Partitioning
7
Example 2: Riding Mowers
• Goal: find a way of classifying families in a city into those likely to purchase a riding mower
and those not likely to buy one.
• Data: 24 households classified as owning or not owning riding mowers (12 owners and 12
non-owners).
• Predictors: Income, Lot Size
Income Lot_Size Ownership
60.0 18.4 owner
85.5 16.8 owner
64.8 21.6 owner
61.5 20.8 owner
87.0 23.6 owner
110.1 19.2 owner
108.0 17.6 owner
82.8 22.4 owner
69.0 20.0 owner
93.0 20.8 owner
51.0 22.0 owner
81.0 20.0 owner
75.0 19.6 non-owner
52.8 20.8 non-owner
64.8 17.2 non-owner
43.2 20.4 non-owner
84.0 17.6 non-owner
49.2 17.6 non-owner
59.4 16.0 non-owner
66.0 18.4 non-owner
47.4 16.4 non-owner
33.0 18.8 non-owner
51.0 14.0 non-owner
63.0 14.8 non-owner
8
Example 2: cont.
59.7
• How was this split selected? The algorithm examined each predictor variable (in
this case, Income and Lot Size) and all possible split values for each variable to
find the best split.
• What are the possible split values for a variable? They are simply the midpoints
between pairs of consecutive values for the predictor.
• The possible split points for Income are {38.1, 45.3, 50.1, …, 109.5} and those for
Lot Size are {14.4, 15.4, 16.2, …, 23}. These split points are ranked according to
how much they reduce impurity (heterogeneity) in the resulting rectangle .
9
Measuring Impurity
10
Impurity and Recursive Partitioning
11
Example 2 (cont.)
Left Node
Owner = 1 Owner = 11
nonowner = 7 nonowner = 5
Total = 8 59.7 Total = 16
Right Node
12
Example 2(cont.)
13
Example 2 (cont.)
14
Example 2 (cont.)
15
Example 3: Build a tree
Categorical Variables:
16
Example: Playing Golf
Output = Yes, No
17
9 yes / 5 No
Outlook
OverCas
t
Day Outlook Humid Wind
D3 OverCas High Weak
t
Sunny D7 OverCas Normal Strong Rain
t
Day Outloo Humid Wind D12 OverCas High Strong Day Outloo Humid Wind
k t / 0 No k
4 yes
D1 Sunny High Weak D13 Pure
OverCas Normal
subset Weak D4 Rain High Weak
D2 Sunny High Strong t
D5 Rain Normal Weak
D8 Sunny High Weak D6 Rain Normal Strong
D9 Sunny Normal Weak D10 Rain Normal Weak
D11 Sunny Normal Strong D14 Rain High Strong
2 yes / 3 No 3 yes / 2 No
Split Further Split Further 18
Overfitting Problem
19
Overfitting Problem
• Different criteria for stopping the tree growth before it starts
overfitting the data. Examples:
Tree depth (i.e., number of splits). We can control the depth
of the tree.
The minimum number of records in a terminal node,
The minimum reduction in impurity,
The minimum number of records in a node needed in order to
split,
The minimum number of records in a terminal node, etc.
The problem is that it is not simple to determine what is a good
stopping point using such rules.
20
Pruning
21
Advantages and Shortcomings
Advantages
• Easy to use, understand
• Produce rules that are easy to interpret & implement
• Variable selection & reduction is automatic
• Do not require the assumptions of statistical models
• Can work without extensive handling of missing data
Shortcomings
• May not perform well where there is the structure in the data that is not well captured by
horizontal or vertical splits
• Since the process deals with one variable at a time, no way to capture interactions
between variables
22
Summary
23
Agenda
24