Professional Documents
Culture Documents
- Regularization
- Bias & Variance
Training Set
70%
30% = # test
Test Set example
Under-fitting & Overfitting
of the model
“Underfit” “Just Right” “Overfit”
“High Bias & Low Variance” “Low Bias & High Variance”
• Underfit: Neither model the training data nor generalize to new data, generating a high
error rate on both the training set and unseen data.
• Overfit: If we have too many features, the learned hypothesis may fit the training dataset
very well, but fail to generalize on new examples.
“Underfit” “Just Right” “Overfit”
“High Bias & Low Variance” “Low Bias & High Variance”
• Underfit: Neither model the training data nor generalize to new data, generating a high
error rate on both the training set and unseen data.
• Overfit: If we have too many features, the learned hypothesis may fit the training dataset
very well, but fail to generalize on new examples.
How to get the best model?
d = Order of the Polynomial function
(1)
d=1 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿ (1))¿
(2 )
d=2 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿(2))¿
d=3
(10)
d = 10 𝜃 → 𝐽 𝑇𝑒𝑠𝑡 (𝜃¿¿ (10))¿
d=5
• One best way is to use ”Validation dataset” when evaluating
machine learning algorithms in order to avoid or limit overfitting.
M=#
training
example
Cross-validation Set
20%
(CV)
Test Set
20%
(Unseen dataset )
d = Order of the Polynomial function
𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) )
(1)
d=1 𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿(1))¿
(2 )
d=2 𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿(2))¿
d=3
d = 10 (10)
𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿ (10))¿
d=4
𝑱 𝒄𝒗 ( 𝜽𝒐𝒓
) , 𝑱 𝒕𝒆𝒔𝒕 ( 𝜽
𝑱 𝒕𝒓𝒂𝒊𝒏 ( 𝜽 )
d=2
d=1 d = 10
(d = 1)
is high
Regularization
term
Regularization and Bias / Variance
How to automatically choose the value of λ ?
𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) ) (1)
𝜃 → 𝐽 𝑐𝑣 ( 𝜃¿¿ (1)) ¿
𝑴𝒊𝒏 𝜽 ( 𝑱 ( 𝜽 ) ) (2 )
𝜃 → 𝐽 𝑐𝑣 ( 𝜃¿¿ (2)) ¿
(10)
𝜃 → 𝐽 𝑐𝑣 (𝜃¿¿ (10))¿
𝑱 𝒕𝒆𝒔𝒕 ( 𝜽¿ ¿(𝟓))¿
High Variance High Bias
“Just Right”
𝐽 𝑐𝑣 (θ)
𝐽 𝑡𝑟𝑎𝑖𝑛 (θ)
Large λ
Small λ
m=1
m=2
m=3 m=4
𝑱 𝒄𝒗 (𝜽)
m=5 m=6
𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
𝑱 𝒄𝒗 (𝜽)
𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
𝑱 𝒄𝒗 (𝜽)
Gap
𝑱 𝒕𝒓𝒂𝒊𝒏 (𝜽 )
Summary