You are on page 1of 43

Building a Classifier

• Classification recognizes patterns that describe


the group to which an item belongs by examining
existing items.

For example,
• Businesses such as credit card or telephone
companies worry about the loss of steady
customers.

• Classification helps discover the characteristics of


customers who are likely to leave
Problem
• Classify the attribute ‘Type’ of glass.arff dataset
Different types of classifier in WEKA
Click on start to run that classifier
No. of instances and Attributes
No of leaves and trees
Overall Accuracy= 66.8%
Confusion matrix
Confusion matrix
• Showing seven different class
• Diagonal elements showing correctly classified
class
• Sum of diagonal elements= 143 (equal to
accuracy percentage shown above)
• Every non diagonal element shows missed
classification (equal to non-accuracy percentage
shown above)
Open configuration panel of J48
• Click in front of Choose option
Click on unpruned parameter and
select true, to make it pruned tree
• Many algorithms attempt to "prune", or simplify,
their results.

• Pruning produces fewer, more easily interpreted


results.

• More importantly, pruning can be used as a tool


to correct for potential overfitting
After ok click on start to run
• Now accuracy = 67.2%
Better result as compare to unpruned tree(66.8)
Select another parameter
Change minNumObj=15
Avoid Small leaves
5.0 shows correctly classified leaves
1.0 shows un-correctly classified leaf
Correctly Classified Instances decreased
Number of leaves and size of tree decreased
Only 8 leaves
Right click
• Select option visualize tree
Decision tree
Same Decision tree
Right click
Select More option
More information about classifier J48
• C4.8 (Latest version of classifier made in Java
language)
• So called J48
Activity
• Open the glass dataset, go to the Classify panel,
choose the J48 tree classifier, and run it (with
default parameters)
• 1. Use the confusion matrix to determine how
many headlamps instances were misclassified
as build wind float?
• 3
• 2. Open the labor dataset, go to the Classify
panel, and run the J48 classifier (with default
parameters). What is the percentage of correctly
classified instances?
• 73.6842
• 3. Now turn pruning off in the J48 configuration
panel by setting unpruned to True and run it
again. What is the percentage of correctly
classified instances now?
• 78.9474

You might also like