You are on page 1of 7

SCHOOL OF INFORMATION SCIENCE

DATA MINING AND WAREHOUSING (INSY 3094)

Information System Regular Student Assignment


Experimenting Decision Tree using WEKA

Fasika Abera-------------------------NSR/6489/10

February, 2021

Submitted to: Dr. Million M.


Experimenting Decision Tree using WEKA

(Classification)

Description
There are two datasets in this report Weather and Iris. The weather data set has 4 nominal
attributes and 14 instances as described as follow a: outlook (sunny, overcast, rainy),
temperature (hot, mild, cold), humidity (high, normal) and windy (TRUE, FALSE). In this dataset
all the attributes are Nominal: the values indicate different distinct categories that describe the
attribute. An attribute could also be Numeric: the values are numbers that measure the
attribute. It is complete data which means no data is missed. When we come to the iris data
set it has 4 numeric attributes: sepallength, sepalwidth, petallength, petalwidth and has 150
instances which is also complete. In addition to this I preprocess and filter Iris data set and
focus on the data instances using class balancer because of it has many instances. I use random
tree, J48 and REPTree algorithms. More over this, I use test option which called percentage
split about 66% is training set and the rest for test for all algorithms. The results are stated as
follow:

Results
Weather
RandomTree algorithm

=== Confusion Matrix ===

a b <-- classified as

2 1 | a = yes

2 0 | b = no

=== Detailed Accuracy By Class ===


TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.667 1.000 0.500 0.667 0.571 -0.408 0.333 0.533 yes

0.000 0.333 0.000 0.000 0.000 -0.408 0.333 0.400 no

Correctly Classified Instances 2 40 %

Incorrectly Classified Instances 3 60 %

J48 Algorithm

=== Confusion Matrix ===

a b <-- classified as

2 1 | a = yes

2 0 | b = no

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.667 1.000 0.500 0.667 0.571 -0.408 0.333 0.533 yes

0.000 0.333 0.000 0.000 0.000 -0.408 0.333 0.400 no

Correctly Classified Instances 2 40 %

Incorrectly Classified Instances 3 60 %

REPTree
=== Confusion Matrix ===

a b <-- classified as

3 0 | a = yes

2 0 | b = no

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

1.000 1.000 0.600 1.000 0.750 ? 0.500 0.600 yes

0.000 0.000 ? 0.000 ? ? 0.500 0.400 no

Iris
Random Tree algorithm

=== Confusion Matrix ===

a b c <-- classified as

15 0 0 | a = Iris-setosa

0 17 2 | b = Iris-versicolor

0 2 15 | c = Iris-virginica
=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa

0.895 0.063 0.895 0.895 0.895 0.832 0.916 0.840 Iris-versicolor

0.882 0.059 0.882 0.882 0.882 0.824 0.912 0.818 Iris-virginica

Correctly Classified Instances 47 92.1569 %

Incorrectly Classified Instances 4 7.8431 %

J48 Algorithm

=== Confusion Matrix ===

a b c <-- classified as

15 0 0 | a = Iris-setosa

0 19 0 | b = Iris-versicolor

0 2 15 | c = Iris-virginica

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa

1.000 0.063 0.905 1.000 0.950 0.921 0.969 0.905 Iris-versicolor

0.882 0.000 1.000 0.882 0.938 0.913 0.967 0.938 Iris-virginica


Correctly Classified Instances 49 96.0784 %

Incorrectly Classified Instances 2 3.9216 %

REPTree algorithm

=== Confusion Matrix ===

a b c <-- classified as

15 0 0 | a = Iris-setosa

0 15 4 | b = Iris-versicolor

0 0 17 | c = Iris-virginica

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa

0.789 0.000 1.000 0.789 0.882 0.838 0.972 0.950 Iris-versicolor

1.000 0.118 0.810 1.000 0.895 0.845 0.971 0.895 Iris-virginica

Correctly Classified Instances 47 92.1569 %

Incorrectly Classified Instances 4 7.8431 %

Conclusion
REPTree algorithm has the simplest tree when we compare with other algorithms. The accuracy of the
weather dataset when using RandomTree, J48 & REPTree respectively are 57.14%, 50% & 57%. The
accuracy of the Iris dataset when using RandomTree, J48 & REPTree are 92%, 96% & 94%.

From the three algorithms, I have observed that the size of the tree is the smallest when using REPTree.
J48 tree size is generally bigger than REPTree and smaller than random tree. Random tree algorithm
generates a larger tree size. When comes to the accuracy, the rank goes like this: J48 algorithm, REPTree
algorithm. Random tree algorithm.

Based on the above results , J48 is the best algorithm when using decision tree.

You might also like