Discussion 4 - SVM Loss and Regularization

Discussion 4 – Loss Function
Loss Function and Optimization

Image Classification : A core task in Computer Vision
Given : dog, cat, truck, plane, …
Cat
Slide Credit : Stanford CS231n

Recall from last time: Challenges of
recognition Illumination
Viewpoint Interclass Variation
Clutter
Deformation
Occlusion

Recall from last time: data-driven approach,
kNN
K=1 K=3 K=5
Train Test
Train Validation Test

Linear Classifier : Loss function and
Optimization
TODO:
Suppose: 3 training examples, 3 classes. 1. Define a loss function
that quantifies our
unhappiness with the
Cat 1.3 3.2 2.2 scores across the training
Car 4.9 5.1 2.5
data.
Frog 2.0 -1.7 -3.1 2. Come up with a way of
efficiently finding the
parameters that minimize
the loss function.
Score Function: 𝒇 ( 𝒙 ,𝑾 )=𝑾𝒙 (optimization)
Linear Classifier : Loss function
A loss function tells how good our current
Suppose: 3 training examples, 3 classes. classifier is. Given a dataset of N examples:
Cat 1.3 3.2 2.2 where is image and

Car 4.9 5.1 2.5 is label (integer) [0, 1, 2]
Frog 2.0 -1.7 -3.1 𝑓 (𝑥,𝑊 ) 𝑦
Loss function =
𝟏
Score Function: 𝒇 ( 𝒙 ,𝑾 )=𝑾𝒙 𝑳= ∑ 𝑳𝒊 ( 𝒇 (𝒙 𝒊 ,𝑾 ¿ ),𝒚 𝒊 )¿
𝑵 𝒊
Linear Classifier : Multiclass SVM loss
Think of it as score separation on 1D number line
1. The correct class should be the rightmost Safety Margin
point on the number line Other Other (Correct Class)
2. The other classes should be located to the

If the correct class is
left of the number line.
neither rightmost nor
clearly separable, there will
3. The correct class should be clearly separable be non-zero loss.
by a large safety margin
Cat 1.3
1. The correct class should be the rightmost Car 4.9
point on the number line Loss = 0
Frog 2.0

left of the number line. Safety Margin
Car
Cat Frog (Correct Class)
1
3. The correct class should be clearly separable 0 1.3 2 -1.9 4.9
by a large safety margin -2.6
Cat 3.2
point on the number line Loss = 2.9
Frog -1.7

Cat
Frog (Correct Class) Car
3. The correct class should be clearly separable -1.7 0 1 3.2

+2.9
5.1
by a large safety margin -3.9
Cat 2.2
point on the number line Loss = 12.9
Frog -3.1

Frog
(Correct Class) Cat Car
2.2 2.5
3. The correct class should be clearly separable 1 -3.1 +6.3
0
by a large safety margin +6.6

Multiclass SVM Loss
Safety Margin
Car
Cat 1.3 1
0 1.3 2 -1.9 4.9 is score for other classes
Car 4.9
Frog 2.0 -2.6 is score for correct class
Frog
Safety Margin
Cat
(Correct Class) Car
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧ if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖
𝑖
𝑠 𝑗 − 𝑠 𝑦 +1∧ otherwise
Cat 3.2
Car 5.1 -1.7 0 1 3.2 5.1
+2.9
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1
Frog -1.7 -3.9
Safety Margin 𝑖
Frog
𝑗 ≠ 𝑦𝑖
Cat 2.2 2.2 2.5
1 -3.1 0
Car 2.5 +6.3
Frog -3.1 +6.6
𝑳= ∑ 𝐦𝐚𝐱 ( 𝟎, 𝒔 𝒋 −𝒔 𝒚 +𝟏 ) 𝒊
Safety Margin
𝒋≠𝒚𝒊
Car
Cat 1.3 1
0 1.3 2 -1.9 4.9
Car 4.9
Frog 2.0 -2.6
Safety Margin
Cat
Frog (Correct Class) Car
Cat 3.2
Car 5.1 -1.7 0 1 3.2 5.1
+2.9
Frog -1.7 -3.9
Cat 2.2
Safety Margin
Frog
2.2 2.5
¿max⁡(0,2. −(−3.1)+1)+max⁡(0,2.5−(−3.1)+1)=max⁡(0,6.3)+max⁡(0,6. )=6.3+6. =12.9
1 -3.1 0
Car 2.5 +6.3 𝑁
1
Frog -3.1 +6.6 𝐿=
𝑁
∑ 𝐿𝑖 =¿
𝑖 =1
Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class
Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧ if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖
𝑠 𝑗 − 𝑠 𝑦 +1∧ otherwise
𝑖
Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖
Square Hinge Loss

𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1
Data loss: Model predictions Regularization: Prevent the model

should match training data from doing too well on training data
Regularization
y
𝒇𝟐
Occam’s Razar: Among multiple
competing hypotheses, the
simplest is the best
𝒇𝟏 William of Ockham 1285-1347
Regularization pushes against fitting the data

too well so we don’t fit noise in the data
Regularization
Regularization
1
𝑁
𝝀 = regularization strength (hyperparameter)
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1
Data loss: Model predictions Regularization: Prevent the model

should match training data from doing too well on training data
Simple examples: More Complex:

L2 regularization: R(W) = Dropout
L1 regularization: R(W) = Batch normalization
Stochastic Depth, Fractional pooling, etc..
Regularization: Expressing Preferences
L1 regularization:
L1 regularization leads to sparsity
L2 regularization:
𝑇 𝑇 L2 regularization likes to spread

𝑤 𝑥=𝑤 𝑥=1
1 2
out the weights
Regularization: Expressing Preferences
L1 regularization: L2 regularization:
𝑤2 𝑤2
𝐻0 𝐻0
𝑤1 𝑤1
𝑊 =[1 , 0] 𝑊= √
[
3 1
,
2 2 ]

Discussion 4 - SVM Loss and Regularization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discussion 4 - SVM Loss and Regularization

Uploaded by

Copyright:

Available Formats

Discussion 4 – Loss Function

Loss Function and Optimization

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

K=1 K=3 K=5

Train Validation Test

Cat 1.3 3.2 2.2 where is image and

1. The correct class should be the rightmost Safety Margin

point on the number line Other Other (Correct Class)

2. The other classes should be located to the

2. The other classes should be located to the

2. The other classes should be located to the

3. The correct class should be clearly separable -1.7 0 1 3.2

2. The other classes should be located to the

by a large safety margin +6.6

Square Hinge Loss

Data loss: Model predictions Regularization: Prevent the model

Regularization pushes against fitting the data

Data loss: Model predictions Regularization: Prevent the model

Simple examples: More Complex:

L1 regularization leads to sparsity

𝑇 𝑇 L2 regularization likes to spread

You might also like