Professional Documents
Culture Documents
Classification Techniques
Topics
The Problem of Classification
Decision Tree Approach and ID3 Algorithm
Nearest Neighbour Approach and PEPLS Algorithm
Bayesian Classifier Approach and Naïve Bayes
Rule-based Classification Approach
Principle of Artificial Neural Network
The Problem of Model Overfitting and Solutions
Evaluating Classification Models
Comparison of Classification Solutions
Classification in Practice
Sarajevo School of Science and Technology 2
Problem Description
Two-Stage Description of Classification
◼ Given a data set of examples, use a classification
technique, known as classifier, to construct a classification
model.
Model
Example Construction
Data Set Method
Model
Evaluation Examples
Accuracy Final Model with
Final Model Measurement accuracy rate A.
Training Set(outlook=sunny)
Temperature Humidity Windy Class
Gain(Temperature) = 0.571 bits Outlook
hot high FALSE N Gain(Humidity) = 0.971 bits sunny overcast rain
hot high TRUE N
Gain(Windy) = 0.020 bits P
mild
cool
high
normal
FALSE
FALSE
N
P
Humidity
mild normal TRUE P
Humidity is chosen as the root. high normal
N P
Temperature
Questions:
◼ How to measure distance?
◼ How to deal with nominal attributes?
where wX and wY are weights for X and Y, and M is the number of attributes. xi and
yi are respectively values of the ith attribute for X and Y, and r is a constant set to 2
(Euclidean distance).
◼ The weight is a ratio of total number of uses of the data object to the total number
of correct uses of the data object. Initially, all training examples have a weight of 1.
e.g.
Δ(row1, row2) = d(row1outlook, row2outlook)2 +
d(row1temp, row2temp)2 + d(row1humidity, row2humidity)2 +
d(row1windy, row2windy)2 = d(sunny,sunny)2 + d(hot,hot)2 +
d(high,high)2 + d(false,true)2 = 0 + 0 + 0 + (1/2)2 = 1/4
(E1, E3) =1.44, (E2, E3) =1.615. E1 is the nearest neighbour.
Sarajevo School of Science and Technology 34
Nearest Neighbour Approach
The PEBLS Example
1 sunny hot high false N 1.5
2 sunny hot high true N 1 Calculate:
3 overcast hot high false P 1
(E4, E1),
(E4, E2),
(E4, E3)
4 rain hot high false P 1 ……
R1
R4
R2 R3
Sarajevo School of Science and Technology 46
Rule-based Classification
Sequential Covering Algorithm
During the rule extraction process, all examples of one class are
considered positive and examples of other classes are negative. A rule is
desirable if it covers most positive examples and no or very few negative
examples.
Once a rule is discovered, examples covered by the rule are removed,
leaving negative examples and positive examples not covered yet
behind.
The learn_one_rule operation finds the optimal rule using a greedy
approach: an initial rule is generated and continuously refined until a
certain evaluation criterion is satisfied.
◼ Rule Growing: start with {}→y, then select a best possible (Ai, vj) pair and add into
antecedent. Repeat the process until rule quality no longer improve.
◼ Rule Refining: start with a positive example, then remove one of its conjuncts so
that it covers more positive examples. Repeat the process until the rule starts to
cover negative examples.
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
Refund=No,
Status = Single
Yes: 3 Yes: 2 Yes: 1 Yes: 0 Yes: 3
(Class = Yes)
No: 4 No: 1 No: 0 No: 3 No: 1
(a) General-to-specific
(b) Specific-to-general
◼ transformation fn:
y = sigmoid(x) = 1/(1 + e-x)
Y
X
teen adult senior male female teen adult senior adult female no 2 N
(4→N5)
Only Gender has “favorable” reduction so we reduce it with P and
YearsOfLicence
count errors again.
1 2 3
N P
P3N1 N No more “favorable” reductions exist so we stop here!
P1N2
P1N3
Problem of Overfitting
Cost Complexity Pruning
Definition of Cost Complexity of a tree:
Let T: a decision tree,
S: a subtree of T
L(T): the number of leaf nodes in T
N: the number of training examples
E: the number of misclassifications
The cost complexity of T is defined as: E/N + L(T)
About Cost Complexity Factor :
Suppose we replace subtree S of T with the best possible leaf. The new tree would
contain L(S) – 1 fewer leaves and make M more errors on the training set.
Therefore, the cost complexity of the new tree is: (E+M)/N + (L(T) - (L(S) - 1))
It is intended that the cost complexities of a tree before pruning and after pruning
should remain the same, ex. E/N + L(T) = (E+M)/N + (L(T) - (L(S) - 1))
Hence, = M / (N * (L(S) - 1))
Use each tree Ti to classify N’ examples from the test set. Calculate the number of
errors (Ei) for each tree. Find the minimum number of errors E’. Select the tree Tj with
the smallest number of nodes such that
e1 (1 − e1 ) e2 (1 − e2 )
d = +
D1 D2
◼ The confidence interval for the true difference dt at a confidence level c is calculated as
◼ The difference between e1 and e2 is significant if the interval does not span 0; otherwise, the difference is
not significant.
Ex: |D1| = 100, e1 = 0.15, |D2| = 5000, e2 = 0.25, d = 0.1, d = 0.0655
dt = 0.1 1.96 x 0.0655 = 0.1 0.128
Since the interval spans 0, the difference between e1 and e2 are not significant.