Professional Documents
Culture Documents
ML Classfication Assignment
ML Classfication Assignment
MACHINE LEARNING
Name: ID:
ABRHAM ADUGNA BDU1402379
Precision talks about how precise/accurate your model is out of those predicted positive. It can be
calculated as the ratio of correctly predicted positive datasets divided by the total number of
positive datasets that were predicted.
Recall a metric that quantifies the number of correct positive predictions made out of all positive
predictions that have been made. Calculates how many of the actual positives our model captures
through labeling it as true positive. It can be calculated as:
F-score / F-measure: it is the harmonic mean of both precision and recall & might be a better
measure to use if we need to balance between precision and recall and it can be calculated as:
Tasks:
1. Estimate the accuracy of the Naive Bayes classifier, Decision tree, and SVM on the breast
cancer data set using 5-fold cross-validation. The breast cancer dataset has numeric values.
You can use wear’s filter to discretize the data.
a) Naive Bayes classifier
1
Steps-1: First changed the breast_cancer_data.txt to weak file format
(breast_cancer_data.arff).
Step-2: And then import breast_cancer_data.arff of file format to weka working area.
Step-3: thirdly, adjust the filter parameter on the attribute Numeric to Nominal.
2
Step-4: performance metrics of the accuracy of the given datasets are:
Note: at the lower-left corner there is a confusion matrix used for calculating the accuracy of the
datasets. It can be calculated as:
Precision = 443/448=0.9888~0.989
3
F-measure = 2*0.989*0.967 / (0.989+0.967) = 0.9778~0.978
Steps-1: then change the Naïve Bayes classifier to Decision tree in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix for the calculation of the accuracy
of the dataset. It can be calculated as:
4
F-measure = 2*0.949*0.943 / (0.949+0.943) = 0.946
5
c) Support Vector Machine (SVM)
Step-1: then change the Decision tree to SVM in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix for the computation of accuracy of
the given datasets. It can be calculated as:
6
Recall = 442/458 = 0.965
2. Estimate the accuracy of the Naive Bayes, Decision tree, and SVM using 5-fold cross-
validation on the voting data.
a. Naive Bayes
Steps-1: changed the vote_data.txt to weka file format (vote_data.arff).
Step-2: After that import vote_data.arff of file format to weka working area.
Note: at the lower-left corner there is a 2x2 confusion matrix for the computation of accuracy of
the given datasets.
7
Step-3: performance metrics of the accuracy of the given datasets are as follows.
Note: at the lower-left corner there is a 2x2 confusion matrix for the computation of accuracy of
the given datasets. It can be calculated as:
Precision = 154/183 = 0.842
Recall = 154 / 168 = 0.917
F-measure = 2 * 0.842 * 0.917 / (0.842 + 0.917) = 0.8775
8
b. The Decision tree
Steps-1: change the Naïve Bayes to Decision tree in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix for the computation of accuracy of
the given datasets. It can be calculated as:
Precision = 162 / 171 = 0.947
Recall = 162 / 168 = 0.964
F-measure = 2 * 0.947 * 0.964 / (0.964 +0.947) = 0.956
9
Step-2: the decision tree graph is as shown.
10
c. Support Vector Machine (SVM)
Note: at the lower-left corner there is a 2x2 confusion matrix used for calculating the accuracy of
the datasets. It can be calculated as:
11