Professional Documents
Culture Documents
The predictions made by the model are with respect to the classes of the outcome vari-
able (also referred as dependent variable) of a problem, which is under consideration.
For example, if the outcome variable of the problem has two classes, then that problem is
referred to as a binary problem. Similarly, if the outcome variable has three classes, then
that problem is known as a three-class problem, and so on.
Consider the confusion matrix given in Table 7.4 for a two-class problem, where the out-
come variable consists of positive and negative values.
The following measures are used in the confusion matrix:
• True positive (TP): Refers to the number of correctly predicted positive instances
• False negative (FN): Refers to the number of incorrectly predicted positive instances
• False positive (FP): Refers to number of incorrectly predicted negative instances
• True negative (TN): Refers to number of correctly predicted negative instances
Now, consider a three-class problem where an outcome variable consists of three classes,
C1, C2, and C3,, as shown in Table 7.5.
From the above confusion matrix, we will get the values of TP, FN, FP, and TN corre-
sponding to each of the three classes, C1, C2, and C3, as shown in Figures 7.6 through 7.8.
Table 7.6 depicts the confusion matrix corresponding to class C1. This table is derived
from Table 7.5, which shows the confusion matrix for all the three classes C1, C2, and C3. In
Table 7.6, the number of TP instances are “a,” where “a” are the class C1 instances that are
correctly classified as belonging to class C1. The “b” and “c” are the class C1 instances that
are incorrectly labeled as belonging to class C2 and class C3, respectively. Therefore, these
instances come under the category of FN. On the other hand, d and g are the instances
belonging to class C2 and class C3, respectively, and they have been incorrectly marked
as belonging to class C1 by the prediction model. Hence, they are FP instances. The e, f, h,
and i are all the remaining samples that are correctly classified as nonclass C1 instances.
TABLE 7.4
Confusion Matrix for Two-Class
Outcome Variables
Predicted
Positive Negative
Actual Positive TP FN
Negative FP TN
TABLE 7.5
Confusion Matrix for Three-Class
Outcome Variables
Predicted
C1 C2 C3
Actual C1 a b c
C2 d e f
C3 g h i
294 Empirical Research in Software Engineering
TABLE 7.6
Confusion Matrix for Class “C1”
Predicted
C1 Not C1
Actual C1 TP = a FN = b + c
Not C1 FP = d + g TN = e + f +h + i
TABLE 7.7
Confusion Matrix for Class “C2”
Predicted
C2 Not C2
Actual C2 TP = e FN = d + f
Not C2 FP = b + h TN = a + c + g + i
TABLE 7.8
Confusion Matrix for Class “C3”
Predicted
C3 Not C 3
Actual C3 TP = i FN = g + h
Not C 3 FP = c + f TN = a + b + d + e
Therefore, they are referred to as TN instances. Similarly, Tables 7.7 and 7.8 depict the con-
fusion matrix for classes C2 and C3.
TP
Sensitivity or recall(Rec) = ×100
TP+FN
But, the important point to note here is that this value comments nothing about the other
instances, which do not belong to class C, but are still incorrectly classified as belonging
to class C.
Specificity is defined as the ratio of correctly classified negative instances to the total
number of actual negative instances. It is given by the following formula:
TN
Specificity = ×100
FP+TN
Model Development and Interpretation 297
TABLE 7.10
Performance Measures for Confusion Matrix given in Table 7.7
Performance Measures Formula Values Obtained Results
TABLE 7.11
Confusion Matrix for Three-Class Outcome Variable
Predicted
High (1) Medium (2) Low (3)
Solution:
From the confusion matrix given in Table 7.11, the values of TP, FN, FP, and TN are
derived and corresponding to each of the three classes high (1), medium (2), and low (3),
and are shown in Tables 7.12 through 7.14.
The value of different performance measures at each severity level, namely, high, medium,
and low on the basis of Tables 7.12 through 7.14 are given in Table 7.15.
TABLE 7.12
Confusion Matrix for Class “High”
Predicted
High Not High
Actual High TP = 3 FN = 9
Not high FP = 4 TN = 44
298 Empirical Research in Software Engineering
TABLE 7.13
Confusion Matrix for Class “Medium”
Predicted
Medium Not Medium
Actual Medium TP = 34 FN = 4
Not medium FP = 13 TN = 9
TABLE 7.14
Confusion Matrix for Class “Low”
Predicted
Low Not Low
Actual Low TP = 5 FN = 5
Not low FP = 1 TN = 49