Professional Documents
Culture Documents
Machine Learning Basics Lecture 6 Overfitting
Machine Learning Basics Lecture 6 Overfitting
Lecture 6: Overfitting
Princeton University COS 495
Instructor: Yingyu Liang
Review: machine learning basics
Math formulation
• Given training data𝑥𝑖 , 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
1 𝑛
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 that minimizes 𝐿 𝑓 = σ𝑖=1 𝑙(𝑓, 𝑥𝑖 , 𝑦𝑖 )
𝑛
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷 [𝑙(𝑓, 𝑥, 𝑦)]
Machine learning 1-2-3
Occam’s razor
Gradient descent;
convex optimization
Overfitting
Linear vs nonlinear models
𝑥12
𝑥22
𝑥1
2𝑥1 𝑥2
𝑦 = sign(𝑤 𝑇 𝜙(𝑥) + 𝑏)
2𝑐𝑥1
𝑥2
2𝑐𝑥2
Polynomial kernel
Linear vs nonlinear models
• Linear model: 𝑓 𝑥 = 𝑎0 + 𝑎1 𝑥
• Nonlinear model: 𝑓 𝑥 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑎3 𝑥 3 + … + 𝑎𝑀 𝑥 𝑀
𝓗1 𝓗2