Professional Documents
Culture Documents
Classification (1):
Decision trees and
Rule induction
4. Classification
Assoc. Prof Nguyen Manh Tuan
AGENDA
Models
Decision Tree
Rule Induction
Algorithms
Decision Trees
Rule Induction
kNN Ensemble Meta Models
Naive Bayesian
Neural Networks
Support Vector Machines
The label over each head represents the value of the target variable (write-off or not).
Colors and shapes represent different predictor attributes.
Attributes:
head-shape: square, circular; body-shape: rectangular, oval; body-color: gray, white
Target variable:
write-off: Yes, No
entropy(S)
= - 0.7 × log2 (0.7) – 0.3 × log2 (0.3)
≈ - 0.7 × - 0.51 - 0.3 × - 1.74
≈ 0.88
Notably, the entropy for each child (ci) is weighted by the proportion of
instances belonging to that child, p(ci).
entropy(parent)
= - p( • ) × log2 p( • ) - p( ☆ ) × log2 p( ☆ )
≈ - 0.53 × - 0.9 - 0.47 × - 1.1
≈ 0.99 (very impure)
entropy(parent) ≈ 0.99
entropy(Residence=OWN) ≈ 0.54
entropy(Residence=RENT) ≈ 0.97
entropy(Residence=OTHER) ≈ 0.98
Example: Attributes:
head-shape: square, circular;
body-shape: rectangular, oval;
body-color: gray, white
Target variable:
write-off: Yes, No
First partitioning:
splitting on body
shape (rectangular
versus oval).
Second
partitioning:
oval body
people sub-
grouped by
head type.
Third
partitioning:
rectangular
body people
subgrouped
by body
color.
Decision Trees
The classification
tree resulting
Decision Trees - Visualizing Segmentations
You classify a new unseen instance by starting at the root node and
following the attribute tests downward until you reach a leaf node,
which specifies the instance’s predicted class.
If we trace down a single path from the root node to a leaf, collecting
the conditions as we go, we generate a rule.
Each rule consists of the attribute tests along the path connected
with AND.
R = { r1 ∩ r2 ∩ r3 ∩ .. rk }
Where k is the number of disjuncts in a rule set R.
Individual rule ri (called disjunct/classification rule) can be represented as
ri = if (antecedent or condition) then (consequent)
Each antecedent/condition can have many attributes and values each
separated by a logical AND operator.
Each antecedent/condition test is called a conjunct of the rule
Each conjunct is a node in the equivalent decision tress
Rule 2 as r2: if (Outlook = rain) and (Wind = false) then Play = yes
- (Outlook = rain) AND (Wind = false): antecedent/condition
- (Outlook = rain): conjunct
A(r0) = 7/19=36.84%
A(r1) = 4/7=57.14%
A(r2) = 3/3=100%
Hands-on problem:
Employing rule induction, with the IRIS dataset