ML Classfication Assignment

BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
FACULTY OF ELECTRICAL AND COMPUTER ENGINEERING
POSTGRADUATE PROGRAM IN COMPUTER ENGINEERING
MACHINE LEARNING
PROJECT REPORT: CLASSIFICATION.
Name: ID:
ABRHAM ADUGNA BDU1402379
Submitted to: Beakal G.(Ph.D)

Submission date: 11/06/2014 E.C
 Write a report of at least two pages based on the given breast-cancer-data and voting data
containing the steps you took to solve the problem for each task, a graph of the accuracy,
precision, recall, F-measurement, and confusion matrix.
Precision: a metric that quantifies the number of correct positive predictions made which means
out of all the positive predicted, what percentage is truly positive.
Precision talks about how precise/accurate your model is out of those predicted positive. It can be
calculated as the ratio of correctly predicted positive datasets divided by the total number of
positive datasets that were predicted.
Precision = true positive / total predicted positives
= true positive / (true positive + false negative)
Recall a metric that quantifies the number of correct positive predictions made out of all positive
predictions that have been made. Calculates how many of the actual positives our model captures
through labeling it as true positive. It can be calculated as:
Recall = true positive / total actual positives
= true positive / (true positive + false negative)
Note: maximizing precision will minimize the number of false positives.
The maximizing recall will minimize the number of false negatives.
Therefore; precision calculates the accuracy for the miner class.
F-score / F-measure: it is the harmonic mean of both precision and recall & might be a better
measure to use if we need to balance between precision and recall and it can be calculated as:
F1 score = 2 (precision * recall) / (precision + recall)
Tasks:
1. Estimate the accuracy of the Naive Bayes classifier, Decision tree, and SVM on the breast
cancer data set using 5-fold cross-validation. The breast cancer dataset has numeric values.
You can use wear’s filter to discretize the data.
a) Naive Bayes classifier
1
Steps-1: First changed the breast_cancer_data.txt to weak file format
(breast_cancer_data.arff).
Step-2: And then import breast_cancer_data.arff of file format to weka working area.
Step-3: thirdly, adjust the filter parameter on the attribute Numeric to Nominal.
2
Step-4: performance metrics of the accuracy of the given datasets are:
Note: at the lower-left corner there is a confusion matrix used for calculating the accuracy of the
datasets. It can be calculated as:
Precision = 443/448=0.9888~0.989
Recall = 443/458 = 0.967
3
F-measure = 2*0.989*0.967 / (0.989+0.967) = 0.9778~0.978
b) The Decision tree
Steps-1: then change the Naïve Bayes classifier to Decision tree in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix for the calculation of the accuracy
of the dataset. It can be calculated as:
Precision = 432/455 = 0.949
Recall = 432/458 = 0.943
4
F-measure = 2*0.949*0.943 / (0.949+0.943) = 0.946
Step-2: the decision tree graph is as shown.
5
c) Support Vector Machine (SVM)
Step-1: then change the Decision tree to SVM in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix for the computation of accuracy of
the given datasets. It can be calculated as:
Precision = 442/454 = 0.974
6
Recall = 442/458 = 0.965
F-measure = 2 * 0.974 * 0.965 = 0.969
2. Estimate the accuracy of the Naive Bayes, Decision tree, and SVM using 5-fold cross-
validation on the voting data.
a. Naive Bayes
Steps-1: changed the vote_data.txt to weka file format (vote_data.arff).
Step-2: After that import vote_data.arff of file format to weka working area.
the given datasets.
7
Step-3: performance metrics of the accuracy of the given datasets are as follows.
Precision = 154/183 = 0.842
Recall = 154 / 168 = 0.917
F-measure = 2 * 0.842 * 0.917 / (0.842 + 0.917) = 0.8775
8
b. The Decision tree
Steps-1: change the Naïve Bayes to Decision tree in the classify tab.
Precision = 162 / 171 = 0.947
Recall = 162 / 168 = 0.964
F-measure = 2 * 0.947 * 0.964 / (0.964 +0.947) = 0.956
9
Step-2: the decision tree graph is as shown.
10
c. Support Vector Machine (SVM)
Step-1: change the Decision tree to SVM in the classify tab.
Note: at the lower-left corner there is a 2x2 confusion matrix used for calculating the accuracy of
the datasets. It can be calculated as:
Precision = 162/172 = 0.942
Recall = 162 / 168 = 0.964
F-measure = 2 * 0.942 * 0.964 / (0.942 + 0.964) = 0.953
11

ML Classfication Assignment

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Classfication Assignment

Uploaded by

Copyright:

Available Formats

BAHIR DAR UNIVERSITY

BAHIR DAR INSTITUTE OF TECHNOLOGY

FACULTY OF ELECTRICAL AND COMPUTER ENGINEERING

POSTGRADUATE PROGRAM IN COMPUTER ENGINEERING

PROJECT REPORT: CLASSIFICATION.

Submitted to: Beakal G.(Ph.D)

Precision = true positive / total predicted positives

= true positive / (true positive + false negative)

Recall = true positive / total actual positives

= true positive / (true positive + false negative)

Note: maximizing precision will minimize the number of false positives.

The maximizing recall will minimize the number of false negatives.

Therefore; precision calculates the accuracy for the miner class.

F1 score = 2 (precision * recall) / (precision + recall)

Recall = 443/458 = 0.967

b) The Decision tree

Precision = 432/455 = 0.949

Recall = 432/458 = 0.943

Step-2: the decision tree graph is as shown.

Precision = 442/454 = 0.974

F-measure = 2 * 0.974 * 0.965 = 0.969

Step-1: change the Decision tree to SVM in the classify tab.

Precision = 162/172 = 0.942

Recall = 162 / 168 = 0.964

F-measure = 2 * 0.942 * 0.964 / (0.942 + 0.964) = 0.953

You might also like