You are on page 1of 28

ML diagnostics

- Regularization
- Bias & Variance
Training Set

70%

30% = # test
Test Set example
Under-fitting & Overfitting
of the model
“Underfit” “Just Right” “Overfit”
“High Bias & Low Variance” “Low Bias & High Variance”

• Underfit: Neither model the training data nor generalize to new data, generating a high
error rate on both the training set and unseen data.

• Overfit: If we have too many features, the learned hypothesis may fit the training dataset
very well, but fail to generalize on new examples.
“Underfit” “Just Right” “Overfit”
“High Bias & Low Variance” “Low Bias & High Variance”

• Underfit: Neither model the training data nor generalize to new data, generating a high
error rate on both the training set and unseen data.

• Overfit: If we have too many features, the learned hypothesis may fit the training dataset
very well, but fail to generalize on new examples.
 How to get the best model?
d = Order of the Polynomial function

(1)
d=1 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿ (1))¿
(2 )
d=2 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿(2))¿
d=3

(10)
d = 10 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿ (10))¿

d=5
• One best way is to use ”Validation dataset” when evaluating
machine learning algorithms in order to avoid or limit overfitting.

M=#
training
example

60% Training Set


= # cv
example

Cross-validation Set
20%
(CV)
Test Set
20%
(Unseen dataset )
d = Order of the Polynomial function

𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) )
(1)
d=1 𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿(1))¿
(2 )
d=2 𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿(2))¿
d=3

d = 10 (10)
𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿ (10))¿

d=4

• We use validation set (cross-validation) to select the model


(best hypothesis)
 Diagnosing Bias Vs. Variance
𝒐𝒓 , 𝑱 𝒕𝒆𝒔𝒕 ( 𝜽 )

𝑱 𝒄𝒗 ( 𝜽𝒐𝒓
) , 𝑱 𝒕𝒆𝒔𝒕 ( 𝜽

𝑱 𝒕𝒓𝒂𝒊𝒏 ( 𝜽 )

d=2
d=1 d = 10
(d = 1)
is high

High is also high


Variance
(d = 5)
High
Bias is low
d=5 - > fitting training set well
d=1
is also high
 Regularization and Bias / Variance
• Regularization helps preventing overfitting

Regularization
term
 Regularization and Bias / Variance
 How to automatically choose the value of λ ?

𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) ) (1)
𝜃 → 𝐽 𝑐𝑣 ( 𝜃¿¿ (1)) ¿
𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) ) (2 )
𝜃 → 𝐽 𝑐𝑣 ( 𝜃¿¿ (2)) ¿

(10)
𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿ (10))¿
𝑱 𝒕𝒆𝒔𝒕 ( 𝜽¿ ¿(𝟓))¿
High Variance High Bias

“Just Right”

𝐽 𝑐𝑣 (θ)

𝐽 𝑡𝑟𝑎𝑖𝑛 (θ)

Large λ
Small λ

Choose the best value of λ


 Learning curves

m=1
m=2

m=3 m=4

𝑱 𝒄𝒗 (𝜽)
m=5 m=6

𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
𝑱 𝒄𝒗 (𝜽)

𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
𝑱 𝒄𝒗 (𝜽)

Gap

𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
 Summary

Fix high variance


Fix high variance
Fix high bias
Fix high bias
Fix high bias
Fix high variance

You might also like