Professional Documents
Culture Documents
Key Challenge:
– Evaluation measures such as accuracy are not well-
suited for imbalanced class
2
Confusion Matrix
Confusion Matrix:
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL
CLASS Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
Accuracy
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)
ad TP TN
Accuracy
a b c d TP TN FP FN
2/15/2021 Introduction to Data Mining, 2nd Edition 4
4
Problem with Accuracy
Consider a 2-class problem
– Number of Class NO examples = 990
– Number of Class YES examples = 10
If a model predicts everything to be class NO, accuracy is
990/1000 = 99 %
– This is misleading because this trivial model does not detect any class
YES example
– Detecting the rare class is usually more interesting (e.g., frauds,
intrusions, defects, etc)
PREDICTED CLASS
Class=Yes Class=No
Class=Yes 0 10
ACTUAL
CLASS Class=No 0 990
2/15/2021 Introduction to Data Mining, 2nd Edition 5
PREDICTED
Class=Yes Class=No
A ACTUAL Class=Yes 0 10
Class=No 0 990
Accuracy: 99%
PREDICTED
B Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490
Accuracy: 50%
2/15/2021 Introduction to Data Mining, 2nd Edition 6
6
Which model is better?
PREDICTED
A Class=Yes Class=No
ACTUAL Class=Yes 5 5
Class=No 0 990
B PREDICTED
Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490
Alternative Measures
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL
CLASS Class=No c d
a
Precision (p)
ac
a
Recall (r)
ab
2rp 2a
F - measure (F)
r p 2a b c
2/15/2021 Introduction to Data Mining, 2nd Edition 8
8
Alternative Measures
10
PREDICTED CLASS Precision (p) 0.5
10 10
Class=Yes Class=No 10
Recall (r) 1
10 0
Class=Yes 10 0 2 *1* 0.5
ACTUAL F - measure (F) 0.62
CLASS Class=No 10 980 1 0.5
990
Accuracy 0.99
1000
Alternative Measures
10
PREDICTED CLASS Precision (p) 0.5
10 10
Class=Yes Class=No 10
Recall (r) 1
10 0
Class=Yes 10 0 2 *1* 0.5
ACTUAL F - measure (F) 0.62
CLASS Class=No 10 980 1 0.5
990
Accuracy 0.99
1000
PREDICTED CLASS 1
Precision (p) 1
1 0
Class=Yes Class=No
1
Recall (r) 0.1
Class=Yes 1 9 1 9
ACTUAL 2 * 0.1*1
CLASS Class=No 0 990 F - measure (F) 0.18
1 0.1
991
Accuracy 0.991
1000
2/15/2021 Introduction to Data Mining, 2nd Edition 10
10
Which of these classifiers is better?
PREDICTED CLASS
Precision (p) 0.8
Class=Yes Class=No
A Class=Yes 40 10
Recall (r) 0.8
F - measure (F) 0.8
ACTUAL
CLASS Class=No 10 40 Accuracy 0.8
PREDICTED CLASS
B Class=Yes Class=No Precision (p) ~ 0.04
Class=Yes 40 10 Recall (r) 0.8
ACTUAL F - measure (F) ~ 0.08
CLASS Class=No 1000 4000
Accuracy ~ 0.8
11
PREDICTED CLASS
Yes No
ACTUAL
Yes TP FN
CLASS
No FP TN
12
Alternative Measures
13
A PREDICTED CLASS
Class=Yes Class=No
Precision (p) 0 . 5
Class=Yes 10 40
TPR Recall (r) 0 . 2
ACTUAL
Class=No 10 40
FPR 0 . 2
CLASS F measure 0.28
B PREDICTED CLASS
Precision (p) 0 . 5
Class=Yes Class=No
TPR Recall (r) 0 . 5
Class=Yes 25 25
ACTUAL Class=No 25 25
FPR 0 . 5
CLASS F measure 0.5
14
ROC (Receiver Operating Characteristic)
15
ROC Curve
(TPR,FPR):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(1,0): ideal
Diagonal line:
– Random guessing
– Below diagonal line:
prediction is opposite
of the true class
16
ROC (Receiver Operating Characteristic)
17
Continuous-valued outputs
18
ROC Curve Example
19
At threshold t:
TPR=0.5, FNR=0.5, FPR=0.12, TNR=0.88
2/15/2021 Introduction to Data Mining, 2nd Edition 20
20
How to Construct an ROC curve
21
TP 5 4 4 3 3 3 3 2 2 1 0
FP 5 5 4 4 3 2 1 1 0 0 0
TN 0 0 1 1 2 3 4 4 5 5 5
FN 0 1 1 2 2 2 2 3 3 4 5
TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0
ROC Curve:
22
Using ROC for Model Comparison
No model consistently
outperforms the other
M1 is better for
small FPR
M2 is better for
large FPR
23
24
Which Classifer is better?
Precision (p) 0 . 98
T1 PREDICTED CLASS TPR Recall (r) 0 . 5
Class=Yes Class=No
FPR 0 . 01
Class=Yes 50 50 TPR/FPR 50
ACTUAL
Class=No 1 99
CLASS F measure 0.66
Precision (p) 0 . 9
T2 PREDICTED CLASS
TPR Recall (r) 0 . 99
Class=Yes Class=No
FPR 0 . 1
ACTUAL
Class=Yes 99 1
TPR/FPR 9.9
Class=No 10 90
CLASS
F measure 0.94
25
Precision (p) 0 . 5
T2 PREDICTED CLASS
TPR Recall (r) 0 . 99
Class=Yes Class=No
FPR 0 . 1
ACTUAL
Class=Yes 99 1
TPR/FPR 9.9
Class=No 100 900
CLASS
F measure 0.66
26
Which Classifer is better? High Skew case
Precision (p) 0 . 3
T1 PREDICTED CLASS TPR Recall (r) 0 . 5
Class=Yes Class=No
FPR 0 . 01
Class=Yes 50 50 TPR/FPR 50
ACTUAL
Class=No 100 9900
CLASS F measure 0.375
Precision (p) 0 . 09
T2 PREDICTED CLASS
TPR Recall (r) 0 . 99
Class=Yes Class=No
FPR 0 . 1
ACTUAL
Class=Yes 99 1
TPR/FPR 9.9
Class=No 1000 9000
CLASS
F measure 0.165
27
28