Professional Documents
Culture Documents
Machine Learning Basics Lecture 5 SVM II
Machine Learning Basics Lecture 5 SVM II
Lecture 5: SVM II
Princeton University COS 495
Instructor: Yingyu Liang
Review: SVM objective
SVM: objective
• Let 𝑦𝑖 ∈ +1, −1 , 𝑓𝑤,𝑏 𝑥 = 𝑤 𝑇 𝑥 + 𝑏. Margin:
𝑦𝑖 𝑓𝑤,𝑏 𝑥𝑖
𝛾 = min
𝑖 |𝑤|
• Support Vector Machine:
𝑦𝑖 𝑓𝑤,𝑏 𝑥𝑖
max 𝛾 = max min
𝑤,𝑏 𝑤,𝑏 𝑖 |𝑤|
SVM: optimization
• Optimization (Quadratic Programming):
2
1
min 𝑤
𝑤,𝑏 2
𝑦𝑖 𝑤 𝑇 𝑥𝑖 + 𝑏 ≥ 1, ∀𝑖
ℎ𝑖 𝑤 = 0, ∀1 ≤ 𝑖 ≤ 𝑙
• Lagrangian:
ℒ 𝑤, 𝜷 = 𝑓 𝑤 + 𝛽𝑖 ℎ𝑖 (𝑤)
𝑖
where 𝛽𝑖 ’s are called Lagrange multipliers
Lagrangian
• Consider optimization problem:
min 𝑓(𝑤)
𝑤
ℎ𝑖 𝑤 = 0, ∀1 ≤ 𝑖 ≤ 𝑙
• Always true:
𝑑 ∗ ≤ 𝑝∗
Lagrange duality
• The primal problem
𝑝∗ ≔ min 𝑓 𝑤 = min max ℒ 𝑤, 𝜶, 𝜷
𝑤 𝑤 𝜶,𝜷:𝛼𝑖 ≥0
• The dual problem
𝑑 ∗ ≔ max min ℒ 𝑤, 𝜶, 𝜷
𝜶,𝜷:𝛼𝑖 ≥0 𝑤
𝑦𝑖 𝑤 𝑇 𝑥𝑖 + 𝑏 ≥ 1, ∀𝑖
• Generalized Lagrangian:
2
1
ℒ 𝑤, 𝑏, 𝜶 = 𝑤 − 𝛼𝑖 [𝑦𝑖 𝑤 𝑇 𝑥𝑖 + 𝑏 − 1]
2
𝑖
where 𝜶 is the Lagrange multiplier
SVM: optimization
• KKT conditions:
𝜕ℒ
= 0, 𝑤 = σ𝑖 𝛼𝑖 𝑦𝑖 𝑥𝑖 (1)
𝜕𝑤
𝜕ℒ
= 0, 0 = σ𝑖 𝛼𝑖 𝑦𝑖 (2)
𝜕𝑏
• Plug into ℒ:
1
ℒ 𝑤, 𝑏, 𝜶 = σ𝑖 𝛼𝑖 − σ𝑖𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖𝑇 𝑥𝑗 (3)
2
combined with 0 = σ𝑖 𝛼𝑖 𝑦𝑖 , 𝛼𝑖 ≥ 0
Only depend on inner products
SVM: optimization
• Reduces to dual problem:
1
ℒ 𝑤, 𝑏, 𝜶 = 𝛼𝑖 − 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖𝑇 𝑥𝑗
2
𝑖 𝑖𝑗
𝛼𝑖 𝑦𝑖 = 0, 𝛼𝑖 ≥ 0
𝑖
Extract
features
𝑘 𝑥, 𝑥′ = 𝑎𝑖 𝜙𝑖 𝑥 𝜙𝑖 (𝑥 ′ )
𝑖
if and only if for any function 𝑐(𝑥),
∫ ∫ 𝑐 𝑥 𝑐 𝑥 ′ 𝑘 𝑥, 𝑥 ′ 𝑑𝑥𝑑𝑥 ′ ≥ 0
𝑥
Color Histogram
Extract build
features hypothesis 𝑦 = 𝑤𝑇𝜙 𝑥
build
hypothesis 𝑦 = 𝑤𝑇𝜙 𝑥
Linear model
Polynomial kernels
𝑥22
𝑥1
2𝑥1 𝑥2
𝑦 = sign(𝑤 𝑇 𝜙(𝑥) + 𝑏)
2𝑐𝑥1
𝑥2
2𝑐𝑥2
𝑐
First layer is fixed. If also learn first layer, it becomes two layer neural network