Professional Documents
Culture Documents
Example
In this example:
• If the classifier predicts negative, you can trust it, the example is
negative. Our AI does not do a mistake in negative. It is sensitive.
However, pay attention, if the example is negative, you can’t be sure
it will predict it as negative (specificity=78%).
• If the classifier predicts positive, you can’t trust it (precision=33%)
• However, if the example is positive, you can trust the classifier will
find it (will not miss it) (recall=100%).
Precision-Recall
Example
In this example:
Since the population is imbalanced:
• The precision is relatively high
• The recall is 100% because all the positive examples are predicted
as positive.
• The specificity is 0% because no negative example is predicted as
negative.
Precision-Recall
Example
In this example:
• If it predicts that an example is positive, you can trust it — it is positive.
• However, if it predicts it is negative, you can’t trust it, the chances are
that it is still positive.
Can be a useful classifier
Precision-Recall
Example
In this example:
• The classifier detects all the positive examples as positive
• It also detects all negative examples as negative.
• All the measures are at 100%.
Why so many measures? Which is more important?
Because it depends
False positive is expensive. False alarm is a nightmare! False negative is expensive. Missing one is a nightmare!
Precision is more important than recall Recall is more important than precision
Precision-Recall Curves
• The precision-recall curve is used for
evaluating the performance of binary
classification algorithms.
• They provide a graphical representation of a
classifier’s performance across many
thresholds, rather than a single value
• It is constructed by calculating and plotting
the precision against the recall for a single
classifier at a variety of thresholds.
• It helps to visualize how the choice of
threshold affects classifier performance and
can even help us select the best threshold for
a specific problem.
Precision-Recall Curves Interpretation
Precision-Recall Curves
(b)
(a) (c)
Model Evaluation
Size of
Class
training and
distribution
test sets
Methods of Estimation:
𝐹𝑃
FP rate (TPR) =
𝐹𝑃+𝑇𝑁
Performance of each classifier represented as a point on ROC curve
changing the threshold of the algorithm, or sample distribution
changes the location of the point
ROC (Receiver Operating Characteristic)
(TPR,FPR):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(0,1): ideal
Diagonal line:
• Random guessing
• Below diagonal line:
prediction is opposite of the true class
ROC (Receiver Operating Characteristic)
1-dimensional data set containing 2 classes (positive and negative)
- any points located at x > t is classified as positive
At threshold t:
TPR=0.5, FPR=0.12 15
Using ROC for Model Comparison
No model consistently outperform the other
• M1 is better for small FPR
• M2 is better for large FPR
𝑇𝑃 6 0.95 +
TP rate (TPR) =
𝑇𝑃+𝐹𝑁 7 0.76 -
𝐹𝑃 8 0.93 +
FP rate (TPR) =
𝐹𝑃+𝑇𝑁 9 0.43 -
10 0.25 +
How to construct an ROC curve
Instance P(+|A) True Class
1. Use a classifier that produces a probability
for each test instance P(+|A) for each test A 1 0.95 +
𝑇𝑃 6 0.85 +
TP rate (TPR) =
𝑇𝑃+𝐹𝑁 7 0.76 -
𝐹𝑃 8 0.53 +
FP rate (TPR) =
𝐹𝑃+𝑇𝑁 9 0.43 -
10 0.25 +
How to construct an ROC curve
1.
(t>=0.76)
Use a classifier that produces a
# Thresh
old >=
True
Class
AI TP FP TN FN
𝐹𝑃 8 0.53 + - 4 4 1 1
FP rate (TPR) = 9 0.43 - - 4 5 0 1
𝐹𝑃+𝑇𝑁
10 0.25 + - 5 5 0 0
How to construct an ROC curve
Instance P(+|A) True TP FP TN FN FPR TPR
Class
1 0.95 + 1 0 5 4 0 1/5
2 0.93 + 2 0 5 3 0 2/5
3 0.87 - 2 1 4 3 1/5 2/5
4 0.85 -
5 0.85 -
6 0.85 + 3 3 2 2 3/5 3/5
7 0.76 - 3 4 1 2 4/5 3/5
8 0.53 + 4 4 1 1 4/5 4/5
9 0.43 - 4 5 0 1 1 4/5
10 0.25 + 5 5 0 0 1 1
ROC Interpretation
AUC (Area Under the Curve)
• AUC stands for "Area under the ROC Curve."
• It measures the entire two-dimensional area
underneath the entire ROC curve from (0,0) to (1,1)
This is an ideal situation. When two curves don’t overlap at all means model has an ideal
measure of separability.
It is perfectly able to distinguish between positive class and negative class.
AUC (Area Under the Curve) Interpretation
Example
When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish
between positive class and negative class.
AUC (Area Under the Curve) Interpretation
Example
This is the worst situation. When AUC is approximately 0.5, the model has no discrimination
capacity to distinguish between positive class and negative class.
AUC (Area Under the Curve) Interpretation
Example
When AUC is approximately 0, the model is actually reciprocating the classes. It means the
model is predicting a negative class as a positive class and vice versa.
AUC (Area Under the Curve) Interpretation
Example
Which of the following ROC curves produce AUC values greater than 0.5?
a b c
d e
ROC vs. PRC
The main difference between ROC curves and precision-
recall curves is that the number of true-negative results is
not used for making a PRC
x-axis y-axis
Curve
Concept Calculation Concept Calculation
Precision-recall
(PRC) Recall TP / (TP + FN) Precision TP / (TP + FP)
Training images
construct
an object
Bounding boxes detector
Confidence level = 1 −α
Confidence Interval for Accuracy
Prediction can be regarded as a Bernoulli trial
Example: Toss a fair coin 50 times, how many heads would turn
up?
correct predictions
Classification Accuracy =
total predictions
Given x (# of correct predictions) or equivalently,
acc = x / N, and N (# of test instances),
0.90 1.65
1 0.95 +
2 0.93 +
3 0.87 +
4 0.85 +
5 0.83 +
6 0.80 -
7 0.76 -
8 0.53 -
9 0.43 -
10 0.25 -
How to construct an ROC curve
Example Instance P(+|A) True
Class
FPR TPR
1 0.95 + 0 1/5
2 0.93 + 0 2/5
3 0.87 + 0 3/5
4 0.85 + 0 4/5
5 0.83 + 0 1
6 0.80 - 1/5 1
7 0.76 - 2/5 1
8 0.53 - 3/5 1
9 0.43 - 4/5 1
10 0.25 - 1 1
How to construct an ROC curve
Example Instance P(+|A) True
Class
FPR TPR
1 0.95 + 0 1/5
2 0.93 - 1/5 1/5
3 0.87 + 1/5 2/5
4 0.85 - 2/5 2/5
5 0.83 + 2/5 3/5
6 0.80 - 3/5 3/5
7 0.76 + 3/5 4/5
8 0.53 - 4/5 4/5
9 0.43 + 4/5 1
10 0.25 - 1 1
Breakout Session
One vs. All (Rest)
One vs. All (Rest)
One vs. One