Professional Documents
Culture Documents
Fasika Abera-------------------------NSR/6489/10
February, 2021
(Classification)
Description
There are two datasets in this report Weather and Iris. The weather data set has 4 nominal
attributes and 14 instances as described as follow a: outlook (sunny, overcast, rainy),
temperature (hot, mild, cold), humidity (high, normal) and windy (TRUE, FALSE). In this dataset
all the attributes are Nominal: the values indicate different distinct categories that describe the
attribute. An attribute could also be Numeric: the values are numbers that measure the
attribute. It is complete data which means no data is missed. When we come to the iris data
set it has 4 numeric attributes: sepallength, sepalwidth, petallength, petalwidth and has 150
instances which is also complete. In addition to this I preprocess and filter Iris data set and
focus on the data instances using class balancer because of it has many instances. I use random
tree, J48 and REPTree algorithms. More over this, I use test option which called percentage
split about 66% is training set and the rest for test for all algorithms. The results are stated as
follow:
Results
Weather
RandomTree algorithm
a b <-- classified as
2 1 | a = yes
2 0 | b = no
J48 Algorithm
a b <-- classified as
2 1 | a = yes
2 0 | b = no
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
REPTree
=== Confusion Matrix ===
a b <-- classified as
3 0 | a = yes
2 0 | b = no
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
Iris
Random Tree algorithm
a b c <-- classified as
15 0 0 | a = Iris-setosa
0 17 2 | b = Iris-versicolor
0 2 15 | c = Iris-virginica
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
J48 Algorithm
a b c <-- classified as
15 0 0 | a = Iris-setosa
0 19 0 | b = Iris-versicolor
0 2 15 | c = Iris-virginica
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
REPTree algorithm
a b c <-- classified as
15 0 0 | a = Iris-setosa
0 15 4 | b = Iris-versicolor
0 0 17 | c = Iris-virginica
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
Conclusion
REPTree algorithm has the simplest tree when we compare with other algorithms. The accuracy of the
weather dataset when using RandomTree, J48 & REPTree respectively are 57.14%, 50% & 57%. The
accuracy of the Iris dataset when using RandomTree, J48 & REPTree are 92%, 96% & 94%.
From the three algorithms, I have observed that the size of the tree is the smallest when using REPTree.
J48 tree size is generally bigger than REPTree and smaller than random tree. Random tree algorithm
generates a larger tree size. When comes to the accuracy, the rank goes like this: J48 algorithm, REPTree
algorithm. Random tree algorithm.
Based on the above results , J48 is the best algorithm when using decision tree.