Professional Documents
Culture Documents
Lecture 10 - Performance Issues, Dataset Partition, Evaluation Metrics (DONE!!) PDF
Lecture 10 - Performance Issues, Dataset Partition, Evaluation Metrics (DONE!!) PDF
Machine Learning
Lecture 10
Wang Xinchao
xinchao@nus.edu.sg
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Course Contents
• Introduction and Preliminaries (Xinchao)
– Introduction
– Data Engineering
– Introduction to Linear Algebra, Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Xinchao)
– Performance Issues
– K-means Clustering
– Neural Networks
2
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
EE2211: Learning Outcome
• I am able to understand the formulation of a machine learning task
– Lecture 1 (feature extraction + classification)
– Lecture 4 to Lecture 9 (regression and classification)
– Lecture 11 and Lecture 12 (clustering and neural network)
• I am able to relate the fundamentals of linear algebra and probability to machine
learning
– Lecture 2 (recap of probability and linear algebra)
– Lecture 4 to Lecture 8 (regression and classification)
– Lecture 12 (neural network)
• I am able to prepare the data for supervised learning and unsupervised learning
– Lecture 1 (feature extraction), Page 25 to 30 [For supervised and unsupervised]
– Lecture 2 (data wrangling) [For supervised and unsupervised]
– Lecture 10 (Training/Validation/Test) [For supervised]
– Programming Exercises in tutorials
• I am able to evaluate the performance of a machine learning algorithm
– Lecture 5 to Lecture 9 (evaluate the difference between labels and predictions)
– Lecture 10 (evaluation metrics)
• I am able to implement regression and classification algorithms
– Lecture 5 to Lecture 9
3
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing
• Cross Validation
• Evaluation Metrics
– Evaluating the quality of a trained classifier
4
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
DatasetPartition therearechoiceofparameters
when
y only
Training, Validation, and Test
datapreparation
We only have a dataset, yet we want to todesignMLmodel
1. Train a model
2. The model may have parameters, and we want to know which
or model
parameters works the bestTraining, Validation
foregpolynomial order and Test
3. With the selected parameters, we want
Suppose to points
these data see aretheall model’s
we have, andreal
we want to use them t
algorithm’s performance on new unseen data
performances on un-seen data
non humanface
humanface
dataset
5
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
I Training, Validation, and Test
• In real-world application,
Training, Validation and Test
– We don’t have test data, since they are unseen
– Imagine you develop a face detector app, you don’t know whom
you will test on
• In lab practice, mean
– We divide the dataset into three parts
yyggggyyyy.mn
aretrainedusingtrainingset
Hidden from
Training !
Training set Validation set Test set
For training the ML model For validation: For testing the “real”
choosing the performance and
parameter or generalization
model
it'scapabilityonunseendata
7
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 9
Training,
Training, Validation,
Validation and Test
and Test
Training set
Validation
set
8
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
10
Training,
Training, Validation,
Validation and Test
and Test
Training set
Validation
set
Test set
11 9
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
opyright EE, NUS. All Rights Reserved.
Training, Validation, and Test
For training the ML model For validation: For testing the “real”
choosing the performance and
parameter or generalization
model
Example:
10
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Cross Validation
moreadvanced
• In practice, we do the k-fold cross validation
4-fold cross validation 4 experiments
rot
Test soyFold 1 Train Train Train Validation
Fold 2 Train Train Validation Train
Fold 3 Train Validation Train Train
Fold 4 Validation Train Train Train
Example: which one to select for test?
Acc. on Val Acc. on Val Acc. on Val Acc. on Val Avg. Acc on
Set 1 Set 2 Set 3 Set 4 Val. Set
Classifier 88% 89% 93% 92% 90.5%
with Param1
Classifier 90% 88% 91% 91% 90%
with Param2
11
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Cross Validation
12
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Training, Validation, and Test
• Validation is however not always used:
– Validation is used when you need to pick parameters or models
– If you have no models or parameters to compare, you may
consider partition the data into only training and test
13
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing
• Cross Validation
14
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Regression
canperformgradient
Mean Square Error I much
descent
easier
Test samples
(MSE = )
14
15
© Copyright EE, NUS. All Rights Reserved.
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Bydefault
Class-1: Positive Class
Class-2: Negative Class
Confusion Matrix
Class-1 Class-2
(predicted) (predicted)
Class-1
(actual)
7 (TP) 7 (FN)
Class-2 TP: True Positive
(actual) FN: False Negative (i.e., Type II Error)
2 (FP) 25 (TN) FP: False Positive (i.e., Type I Error)
TN: True Negative
16
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Class-1 Class-2
Evaluation Metrics (predicted) (predicted)
Class-1
Classification (actual)
7 (TP) 7 (FN)
Class-2
• How many samples in the dataset
(actual)
have the real label of class-1? 2 (FP) 25 (TN)
14
• How many samples are there in total? 41
32 9 incorrectly
• How many sample are correctly classified? How many are
classified?
17
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Classification
(predicted) (predicted)
P Recall
(actual) TP FN TP/(TP+FN)
N
(actual) FP TN
Precision Accuracy
TP/(TP+FP) (TP+TN)/(TP+TN+FP+FN)
in prediction
18
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 16
Evaluation Metrics
Classification
Cost Matrix for Binary Classification
!
" !
#
(predicted) (predicted) Total cost:
!!,! * TP +
P
!!,# * FN +
(actual) !"," * TP !",$ * FN !#,! * FP +
N !#,# * TN
(actual) !$," * FP !$,$ * TN
Usually, !!,! and !#,# are set to 0; !#,! and !!,# may and may not equal
19
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
• Example of cost matrix
– Assume we would like to develop a self-driving car system
– We have an ML system that detects the pedestrians using camera,
by conducing a binary classification
• When it detects a person (positive class), the car should stop
• When no person is detected (negative class), the car keeps going
True Positive (cost !!,! )
There is person, ML detects person and car stops
21
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
$
% &$
Classification (predicted) (predicted)
total a positive
Recall gal P
(actual) TP FN
(True Positive Rate) TPR = TP/(TP+FN)
(False Negative Rate) FNR = FN/(TP+FN)
0 N
(actual) FP TN
(True Negative Rate) TNR = TN/(FP+TN)
(False Positive Rate) FPR = FP/(FP+TN)
recall EYE
TPR + FNR = 1 (100% of positive-class data)
TNR + FPR = 1 (100% of negative-class data)
22
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Ypredco Ypred so
N1 N2 P1 N3 P2 P3
Classification y -1.1 -0.5 -0.1 0.2 0.6 0.9
prediction
pred
P1 P2 P3
classifiermodel
T
Yeo isthethreshold
N1 N2 N3
23
20
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
N1 N2 P1 N3 P2 P3
Classification y -1.1 -0.5 -0.1 0.2 0.6 0.9
rhigherthresholdw r
t
y
I moreinthe venone
24
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
21
Evaluation
Evaluation Metrics Metrics
N1 N2 P1 N3 P2 P3
Classification y -1.1 -0.5 -0.1 0.2 0.6 0.9
lowerthreshold wiv t
y
f
25
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
22
Evaluation Metrics
TP, FP, Recall
P P P
(actual) =?
2 =?
I (actual) =? 2 =?
I (actual) =? 3 = ?o
N N N
(actual) =? I =?
2 (actual) =? 0 =?
3 (actual) =?
I =?
2
26
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
P P P
(actual) = = (actual) = = (actual) = =
N N N
(actual) = = (actual) = = (actual) = =
27
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Evaluation Metrics
Classification
28
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 26
Evaluation Metrics
Evaluation Metrics
Classification
differentthreshold
1
29
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 27
Evaluation Metrics
Sliding the threshold
Classification
30
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
I convert
É
FPR FMR
ft
t TPR + FNR = 1
AUC areaunderthecurve
Detection Error Trade-off (DET) curve Receiver Operating Characteristic (ROC) Curve
31
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Area Under the Curve (ROC curve)
gx dvalue
higher
32
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
ai é e.j gTxt g x so
Area Under the Curve (ROC curve) É é e.g jittiigix co
increasing8 Targhettitput
Consider a set of binary samples indexed by % = 1, … , * '
, -
33
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Evaluation Metrics
Evaluation Metrics
Area Under the Curve (ROC curve)
Area Under the Curve Area(ROCUnder
curve)the Curve (R
Consider also a step function given by
AUC
1, provides
if . > 0anideal AUC s provides
aggregate measure an of aggregat
performa
gtxt go
0 . = 1possible
0.5 if . = 0 g x t possible
classification thresholds.
lx
g model
classification
It can be seenthresho
as the
the model ranks a random the positiveranks a random
example more
0 if . < 0 ideal
non random negative example.
random negative example.
The Area Under the ROC Curve (AUC) can be
expressed as N N N N N N N N N N N N N N N N
P N N
P N N
P N N
P N N
P N
P N
P N
P N
P N
P N
P P N
,$
,(
0.0+ Output of Log.
0.0 Reg. model Output of Log.1.0
Reg
AUC = ,$ ,%
< = 0 .)*
*-+
numberofsampled )-+
ranges in value fromAUC
frompositiveclassAUC ranges
0 (100% in value
wrong from 0 to
prediction) (1
The AUC=1 when all the samples
prediction). areprediction).
ranked
It is scale-invariant, correctly.
It is scale-invarian
classification-threshold-
Ref: D. Bamber, The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, J. Math. Psychol. 12 (1975) 387--415.
34
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Gini coefficient
G=A/(A+B)
TPR FPR
2A
L AUC I
0.5
AUC A 40.5
At 3 0.5
35
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Confusion Matrix for Multicategory Classification
⁞ ⁞ ⁞ ⁞
"( (actual) "(,&% "(,'% "(,(%
36
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Other Issues
• Computational speed and memory consumptions are also
important factors
– Especially for mobile or edge devices
• Other factors
– Parallelable, Modularity, Maintainability
– Not focus of this module
37
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
38
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"