You are on page 1of 6

4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

C O N F U S I O N M AT R I X
In the previous chapters of our
Machine Learning tutorial
(Neural Networks with Python
and Numpy and Neural
Networks from Scratch ) we
implemented various
algorithms, but we didn't
properly measure the quality of
the output. The main reason was
that we used very simple and
small datasets to learn and test.
In the chapter Neural Network:
Testing with MNIST, we will work with large datasets and ten classes, so we need proper
evaluations tools. We will introduce in this chapter the concepts of the confusion matrix:

A confusion matrix is a matrix (table) that can be used to measure the performance of an
machine learning algorithm, usually a supervised learning one. Each row of the confusion
matrix represents the instances of an actual class and each column represents the instances of
a predicted class. This is the way we keep it in this chapter of our tutorial, but it can be the
other way around as well, i.e. rows for predicted classes and columns for actual classes. The
name confusion matrix reflects the fact that it makes it easy for us to see what kind of
confusions occur in our classification algorithms. For example the algorithms should have
predicted a sample as c because the actual class is c , but the algorithm came out with c . In
i i j

this case of mislabelling the element cm[i, j] will be incremented by one, when the confusion
matrix is constructed.

We will define methods to calculate the confusion matrix, precision and recall in the following
class.

2-CLASS CASE

In a 2-class case, i.e. "negative" and "positive", the confusion matrix may look like this:

predicted

https://www.python-course.eu/confusion_matrix.php 1/6
4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

actual negative positive

negative 11 0

positive 1 12

The fields of the matrix mean the following:

predicted

actual negative positive

negative TN FP
True positive False Positive

positive FN TP
False negative True positive

We can define now some important performance measures used in machine learning:

Accuracy:

TN +TP
AC =
TN +FP +FN +TP

The accuracy is not always an adequate performance measure. Let us assume we have 1000
samples. 995 of these are negative and 5 are positive cases. Let us further assume we have a
classifier, which classifies whatever it will be presented as negative. The accuracy will be a
surprising 99.5%, even though the classifier could not recognize any positive samples.

Recall aka. True Positive Rate:

TP
recall =
FN +TP

True Negative Rate:

FP
T NR =
TN +FP

Precision:

https://www.python-course.eu/confusion_matrix.php 2/6
4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

TP
precision :
FP +TP

MULTI-CLASS CASE

To measure the results of machine learning algorithms, the previous confusion matrix will not
be sufficient. We will need a generalization for the multi-class case.

Let us assume that we have a sample of 25 animals, e.g. 7 cats, 8 dogs, and 10 snakes, most
probably Python snakes. The confusion matrix of our recognition algorithm may look like the
following table:

predicted

actual dog cat snake

dog 6 2 0

cat 1 6 0

snake 1 1 8

In this confusion matrix, the system correctly predicted six of the eight actual dogs, but in two
cases it took a dog for a cat. The seven acutal cats were correctly recognized in six cases but
in one case a cat was taken to be a dog. Usually, it is hard to take a snake for a dog or a cat,
but this is what happened to our classifier in two cases. Yet, eight out of ten snakes had been
correctly recognized. (Most probably this machine learning algorithm was not written in a
Python program, because Python should properly recognize its own species :-) )

You can see that all correct predictions are located in the diagonal of the table, so prediction
errors can be easily found in the table, as they will be represented by values outside the
diagonal.

We can generalize this to the multi-class case. To do this we summarize over the rows and
columns of the confusion matrix. Given that the matrix is oriented as above, i.e., that a given
row of the matrix corresponds to specific value for the "truth", we have:

https://www.python-course.eu/confusion_matrix.php 3/6
4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

M ii
P recisioni =
∑ M ji
j

M ii
Recalli =
∑ M ij
j

This means, precision is the fraction of cases where the algorithm correctly predicted class i
out of all instances where the algorithm predicted i (correctly and incorrectly). recall on the
other hand is the fraction of cases where the algorithm correctly predicted i out of all of the
cases which are labelled as i.

Let us apply this to our example:

The precision for our animals can be calculated as

precisiondogs = 6/(6 + 1 + 1) = 3/4 = 0.75

precisioncats = 6/(2 + 6 + 1) = 6/9 = 0.67

precisionsnakes = 8/(0 + 0 + 8) = 1

The recall is calculated like this:

recalldogs = 6/(6 + 2 + 0) = 3/4 = 0.75

recallcats = 6/(1 + 6 + 0) = 6/7 = 0.86

recallsnakes = 8/(1 + 1 + 8) = 4/5 = 0.8

EXAMPLE

We are ready now to code this into Python. The following code shows a confusion matrix for
a multi-class machine learning problem with ten labels, so for example an algorithms for
recognizing the ten digits from handwritten characters.

If you are not familiar with Numpy and Numpy arrays, we recommend our tutorial on Numpy.

import numpy as np
cm = np.array(
[[5825, 1, 49, 23, 7, 46, 30, 12, 21, 26],
[ 1, 6654, 48, 25, 10, 32, 19, 62, 111, 10],
[ 2, 20, 5561, 69, 13, 10, 2, 45, 18, 2],

https://www.python-course.eu/confusion_matrix.php 4/6
4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

[ 6, 26, 99, 5786, 5, 111, 1, 41, 110, 79],


[ 4, 10, 43, 6, 5533, 32, 11, 53, 34, 79],
[ 3, 1, 2, 56, 0, 4954, 23, 0, 12, 5],
[ 31, 4, 42, 22, 45, 103, 5806, 3, 34, 3],
[ 0, 4, 30, 29, 5, 6, 0, 5817, 2, 28],
[ 35, 6, 63, 58, 8, 59, 26, 13, 5394, 24],
[ 16, 16, 21, 57, 216, 68, 0, 219, 115, 5693]])

The functions 'precision' and 'recall' calculate values for a label, whereas the function
'precision_macro_average' the precision for the whole classification problem calculates.

def precision(label, confusion_matrix):


col = confusion_matrix[:, label]
return confusion_matrix[label, label] / col.sum()

def recall(label, confusion_matrix):


row = confusion_matrix[label, :]
return confusion_matrix[label, label] / row.sum()
def precision_macro_average(confusion_matrix):
rows, columns = confusion_matrix.shape
sum_of_precisions = 0
for label in range(rows):
sum_of_precisions += precision(label, confusion_matrix)
return sum_of_precisions / rows
def recall_macro_average(confusion_matrix):
rows, columns = confusion_matrix.shape
sum_of_recalls = 0
for label in range(columns):
sum_of_recalls += recall(label, confusion_matrix)
return sum_of_recalls / columns

print("label precision recall")


for label in range(10):
print(f"{label:5d} {precision(label, cm):9.3f} {recall(label,
cm):6.3f}")

label precision recall


0 0.983 0.964
1 0.987 0.954
2 0.933 0.968
3 0.944 0.924
4 0.947 0.953
5 0.914 0.980
6 0.981 0.953
7 0.928 0.982
8 0.922 0.949
9 0.957 0.887

print("precision total:", precision_macro_average(cm))


print("recall total:", recall_macro_average(cm))

https://www.python-course.eu/confusion_matrix.php 5/6
4/19/2019 Machine Learning with Python: Confusion Matrix in Machine Learning with Python

precision total: 0.949688556405


recall total: 0.951453154788

def accuracy(confusion_matrix):
diagonal_sum = confusion_matrix.trace()
sum_of_all_elements = confusion_matrix.sum()
return diagonal_sum / sum_of_all_elements

accuracy(cm)

After having executed the Python code above we received the following:
0.95038333333333336

© 2011 - 2018, Bernd Klein, Bodenseo; Design by Denise Mitchinson adapted for python-course.eu by
Bernd Klein

https://www.python-course.eu/confusion_matrix.php 6/6

You might also like