2020cs674 Report

SUBMITTED TO
Sir Aizaz Akmal

SUBMITTED BY
Abdul Moiz Arif
AI LAB
(2020-CS-674)
1|Page
Performance Metrics
Confusion Matrix:
Confusion Matrix is a 2x2 matrix that uses actual values and runs them through the
model to check the accuracy of the model, by checking the number of True Positives,
True Negatives, False Positives and False Negatives.
Since the outcome is always binary in this test one outcome is considered as positive and
the other is considered negative. If the model predicts positive and the actual outcome is
positive then it is called true positive, if the model predicts negative and the actual is
positive false negative, if the model predicts negative and the outcome is negative then it
is called true negative and, in the end, if the model predicts positive but the actual output
is negative it is called false positive
Accuracy:
As it is clear from the formula, accuracy of a model means Number of correct predictions
divided by the total prediction. Hence it is the rate by which the model predicts correctly.
Or in a way it can be considered as the percentage of correct predictions.
Precision:
2|Page
Recall:
Values for these performance measures are taken form the confusion matrix.
10-Fold Cross Validation
In 10-fold cross validation, which is widely accepted and considered to be the best kind of k-fold
validation, we divide the data in 10 parts and the in 10 iterations use 1 part as test data and the
rest as training data and then in the next iteration another part is used as test data set, until all the
parts are used as both training and test data sets. In each iteration we find out the estimated error
occurred in that particular classifier (“Ei” where E is the error and i is the number of iteration).
The last step is by using the formula for k fold cross validation, we add up all the errors and take
divide by the number of iterations to calculate E
𝑘
1
𝐸 = ∑ 𝐸𝑖
𝑘
𝑖=1
3|Page
Naïve Bayes
4|Page
The principle of Naïve Bayes Classification works on conditional probability. Conditional
probability is the measure of chances of one event to occur when another event has already
occurred.
5|Page
In this particular example we are discussing the chances of a person having diabetes, given their
glucose level and blood pressure. So, the probability we are calculating is of the event that whether
a person have diabetes or not, given the circumstances. In the first snippet it can be seen that the
values we took are 111 for glucose and 68 for blood pressure and the model claims that the person
should be classified as a person without diabetes, and in second picture it can be seen that the on
values 155 for glucose level and 86 for blood pressure, the person is declared diabetic.
Performance Metrics for Naïve Bayes
6|Page
10-fold cross validation for Naïve Bayes
7|Page
K Means
k means algorithm is an unsupervised learning algorithm, which, which means we do not have
results for all the inputs, this algorithm has k number of classifications called clusters. Clusters are
calculated using the elbow method. In this example the graph suggests that the number of clusters
should be 3 as the lower bend starts at 3. It calculates the mean values for the centroids in k number
of iterations.
However, the elbow method does not always give optimal answer instead it gives a heuristic value
to consider. If we take k=3 in this example we get an accuracy of 0.35 but f we use k=2 we get
accuracy of 0.71. That’s why we used k=2 even though elbow method suggested k=3
8|Page
In this context the values given below are the mean values of centroids.
This is the graph plotted against the data using the k mean classification.
9|Page
Performance Metrics for K means Algorithm
10 | P a g e
K Nearest Neighbors
11 | P a g e
k nearest neighbors algorithm uses a number k to identify the cluster to which any new item
belongs. When a new item is added to the dataset it calculates the distance of k number of its
neighbors from its position, and whatever cluster has the greatest number of neighbors, new data
item is added to that cluster. In the above given example, the value of k is 3 and it predicts
whether a person is diabetic or not, given its age and BMI (BMI stands for body mass index,
used to determine whether a person is in healthy weight range or not). There are only 2 clusters
in this case i.e., yes and no. one axis has age and the other has BMI. The new entry has values for
Age and BMI and cluster is determined using “knn algorithm”.
At age=40 and BMI=25 the person is classified as non-diabetic. While in the second example
you can see that at age=70 and BMI=32 the person is classified as diabetic.
12 | P a g e
Performance Metrics for K nearest neighbors
13 | P a g e
10-fold cross validation for knn
14 | P a g e
Decision Tree
15 | P a g e
Performance Metrics
16 | P a g e
10-Fold Cross Validation for Decision Tree
17 | P a g e
18 | P a g e

2020cs674 Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2020cs674 Report

Uploaded by

Copyright:

Available Formats

SUBMITTED TO

Sir Aizaz Akmal

10-Fold Cross Validation

Performance Metrics for Naïve Bayes

You might also like