You are on page 1of 4


Rule-Based Classification
Model Rules
Set of IF-THEN rules
IF age = youth AND student = yes THEN
buys_computer = yes
Rule antecedent/precondition vs. rule consequent
Assessment of a rule: coverage and accuracy
ncovers = # of tuples covered by R
ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set
accuracy(R) = ncorrect / ncovers
Rule Accuracy and Coverage
If-Then Rules
Rule Triggering
Input X satisfies a rule
Several rules are triggered Conflict Resolution
Size Ordering

Highest priority to toughest (rule antecedent

size) rule
Rule Ordering

Rules are prioritized before-hand

Class based ordering

Rules for most prevalent class comes first
or based on mis-classification cost / class

Rule-based ordering
Rule Quality based measures
Ordered list Decision list Must be
processed strictly in order
No rule is triggered Default rule

Rule Extraction from a Decision Tree

Example: Rule extraction from the buys_computer

IF age = young AND student = no THEN
buys_computer = no
IF age = young AND student = yes THEN
buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN
buys_computer = yes
IF age = young AND credit_rating = fair THEN
buys_computer = no
Set of extracted rules very high
Pruning may be required
Rule Generalization For a given rule
antecedent any condition that does not improve
the estimated accuracy can be dropped
Side-effects of pruning

Mutually Exclusive? / Exhaustive?

C4.5 Class Ordering for Conflict resolution

All rules for a single class are grouped


Class rule sets are ranked to minimize falsepositive errors

Default class one that contains most training

tuples not covered by any rule
Rule Extraction from the Training Data

Sequential covering algorithm: Extracts rules

directly from training data
Associative Classification Algorithms may also

be used

Typical sequential covering algorithms: FOIL (First

Order Inductive Learner), AQ, CN2, RIPPER

Rules are learned sequentially, each rule for a given
class Ci will cover many tuples of Ci but none (or
few) of the tuples of other classes

Rules are learned one at a time

Each time a rule is learned, the tuples covered by

the rules are removed

The process repeats on the remaining tuples

unless termination condition, e.g., when no more
training examples or when the quality of a rule
returned is below a user-specified threshold
Algorithm: Sequential Covering
Input: D, Att_vals
Output: If-Then rules
Rule_set = {}
For each class c do
Rule = Learn_One_Rule(D, Att_vals, c) //
Finds best rule for given class
Remove tuples covered by Rule from D
Until terminating condition
Rule_set = Rule_set + Rule
End for
Return Rule_Set

Start with the most general rule possible: condition =

Adding new attributes by adopting a greedy depthfirst strategy
Picks the one that most improves the rule quality

Start with IF _ THEN loan_decision = accept
Consider IF loan_term=short THEN.. / IF
loan_term=long THEN.. / IF income = high
THEN.. / IF income = medium THEN.. /
If best one is IF income = high THEN
loan_decision = accept expand it further

Rule Quality measures

Coverage or Accuracy independently will not be
Rule-Quality measures: consider both coverage and
Foil-gain (in FOIL & RIPPER): assesses
info_gain by extending condition

It favors rules that have high accuracy and cover

many positive tuples
R Existing rule; R Extended rule
Likelihood Ratio Statistic
Likelihood_Ratio = 2 i=1
fi log(fi/ei)
Greater this value higher the significance
Rule pruning based on an independent set of test
Pos/neg are # of positive/negative tuples covered
by R.
If FOIL_Prune is higher for the pruned version of
R, prune R