Professional Documents
Culture Documents
Geoff Hulten
Don’t expect your favorite learner to always be best Try different approaches and compare
Bias and Variance
• Bias – error caused because the model can not represent the concept
True Concept
Visualizing Bias
• Goal: produce a model
that matches this concept
• Training Data for the
concept
Training Data
Visualizing Bias Bias Mistakes
Model Predicts -
• Variance: Sensitivity to
changes & noise
Fit a Linear Model
Another way to think about Bias & Variance
Bias and Variance: More Powerful Model
Model
• Powerful Models can Predicts +
represent complex
concepts
• No Mistakes!
Model Predicts -
Bias and Variance: More Powerful Model
Model
• But get more data… Predicts +
• Not good!
Model Predicts -
Overfitting vs Underfitting
Overfitting Underfitting
• Fitting the data too well • Learning too little of the true
• Features are noisy / uncorrelated to concept
concept • Features don’t capture concept
• Modeling process very sensitive • Too much bias in model
(powerful) • Too little search to fit model
• Too much search
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The Effect of Noise
The Effect of Features
Throw out
• Captures concept
• Simple model -> low bias
• Powerful -> low variance
N
The Power of a Model Building Process
Weaker Modeling Process More Powerful Modeling Process
( higher bias ) (higher variance)
• Simple Model (e.g. linear) • Complex Model (e.g. high order polynomial)
• Fixed sized Model (e.g. fixed # weights) • Scalable Model (e.g. decision tree)
𝑛
𝐿𝑜𝑠𝑠 ( 𝑆 )=∑ 𝐿𝑜𝑠𝑠 ( 𝑦 ,+𝛼
¿
𝑦 𝑖𝐿𝑜𝑔
𝑖 ) 2 (¿ 𝑁𝑜𝑑𝑒𝑠)
𝑖
Ways to Control Logistic Regression
• Adjust Step Size
• Regularization
𝑛 ¿ 𝑊𝑒𝑖𝑔h𝑡𝑠
𝐿𝑜𝑠𝑠 ( 𝑆 )=∑ 𝐿𝑜𝑠𝑠 ( 𝑦 +𝛼
, 𝑦 𝑖 )∑ ¿𝑤 𝑗 ∨¿¿
¿
𝑖
𝑖 𝑗
Modeling to Balance Under & Overfitting
• Data
• Learning Algorithms
• Feature Sets
• Complexity of Concept
• Search and Computation
• Parameter sweeps!
Parameter Sweep
# optimize first parameter
for p in [ setting_certain_to_underfit, …, setting_certain_to_overfit]:
# examine the parameters that seem best and adjust whatever you can…
Types of Parameter Sweeps
• Optimize one parameter at a • Grid
time • Try every combination of every parameter
• Optimize one, update, move on
• Iterate a few times • Quick vs Full runs
• Expensive parameter sweep on ‘small’
sample of data (e.g. grid)
• Gradient descent on meta- • A bit of iteration on full data to refine
parameters
• Start somewhere ‘reasonable’ • Intuition & Experience
• Computationally calculate gradient • Learn your tools
wrt change in parameters • Learn your problem
Summary of Overfitting and Underfitting
• Bias / Variance tradeoff a primary challenge in machine learning