You are on page 1of 19

Discussion 4 – Loss Function

Loss Function and Optimization


Image Classification : A core task in Computer Vision
Given : dog, cat, truck, plane, …

Cat

Slide Credit : Stanford CS231n


Recall from last time: Challenges of
recognition Illumination
Viewpoint Interclass Variation

Clutter
Deformation
Occlusion

Slide Credit : Stanford CS231n


Recall from last time: data-driven approach,
kNN

K=1 K=3 K=5

Train Test

Train Validation Test


Slide Credit : Stanford CS231n
Linear Classifier : Loss function and
Optimization
TODO:
Suppose: 3 training examples, 3 classes. 1. Define a loss function
that quantifies our
unhappiness with the
Cat 1.3 3.2 2.2 scores across the training
Car 4.9 5.1 2.5
data.
Frog 2.0 -1.7 -3.1 2. Come up with a way of
efficiently finding the
parameters that minimize
the loss function.
Score Function: 𝒇 ( 𝒙 ,𝑾 )=𝑾𝒙 (optimization)
Slide Credit : Stanford CS231n
Linear Classifier : Loss function
A loss function tells how good our current
Suppose: 3 training examples, 3 classes. classifier is. Given a dataset of N examples:

Cat 1.3 3.2 2.2 where is image and


Car 4.9 5.1 2.5 is label (integer) [0, 1, 2]
Frog 2.0 -1.7 -3.1 𝑓 (𝑥,𝑊 ) 𝑦
Loss function =

𝟏
Score Function: 𝒇 ( 𝒙 ,𝑾 )=𝑾𝒙 𝑳= ∑ 𝑳𝒊 ( 𝒇 (𝒙 𝒊 ,𝑾 ¿ ),𝒚 𝒊 )¿
𝑵 𝒊
Slide Credit : Stanford CS231n
Linear Classifier : Multiclass SVM loss
Think of it as score separation on 1D number line

1. The correct class should be the rightmost Safety Margin

point on the number line Other Other (Correct Class)

2. The other classes should be located to the


If the correct class is
left of the number line.
neither rightmost nor
clearly separable, there will
3. The correct class should be clearly separable be non-zero loss.
by a large safety margin
Linear Classifier : Multiclass SVM loss
Think of it as score separation on 1D number line
Cat 1.3
1. The correct class should be the rightmost Car 4.9
point on the number line Loss = 0
Frog 2.0

2. The other classes should be located to the


left of the number line. Safety Margin
Car
Cat Frog (Correct Class)

1
3. The correct class should be clearly separable 0 1.3 2 -1.9 4.9
by a large safety margin -2.6
Linear Classifier : Multiclass SVM loss
Think of it as score separation on 1D number line
Cat 3.2
1. The correct class should be the rightmost Car 5.1
point on the number line Loss = 2.9
Frog -1.7

2. The other classes should be located to the


left of the number line. Safety Margin
Cat
Frog (Correct Class) Car

3. The correct class should be clearly separable -1.7 0 1 3.2


+2.9
5.1
by a large safety margin -3.9
Linear Classifier : Multiclass SVM loss
Think of it as score separation on 1D number line
Cat 2.2
1. The correct class should be the rightmost Car 2.5
point on the number line Loss = 12.9
Frog -3.1

2. The other classes should be located to the


left of the number line. Safety Margin
Frog
(Correct Class) Cat Car
2.2 2.5
3. The correct class should be clearly separable 1 -3.1 +6.3
0

by a large safety margin +6.6


Multiclass SVM Loss
Safety Margin
Car
Cat Frog (Correct Class)

Cat 1.3 1
0 1.3 2 -1.9 4.9 is score for other classes
Car 4.9
Frog 2.0 -2.6 is score for correct class

Frog
Safety Margin
Cat
(Correct Class) Car
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧  if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖
𝑖

𝑠 𝑗 − 𝑠 𝑦 +1∧  otherwise
Cat 3.2
Car 5.1 -1.7 0 1 3.2 5.1
+2.9

𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1
Frog -1.7 -3.9

Safety Margin 𝑖
Frog
(Correct Class) Cat Car
𝑗 ≠ 𝑦𝑖
Cat 2.2 2.2 2.5
1 -3.1 0
Car 2.5 +6.3
Frog -3.1 +6.6
𝑳= ∑ 𝐦𝐚𝐱 ( 𝟎, 𝒔 𝒋 −𝒔 𝒚 +𝟏 ) 𝒊

Safety Margin
𝒋≠𝒚𝒊
Car
Cat Frog (Correct Class)

Cat 1.3 1
0 1.3 2 -1.9 4.9
Car 4.9
Frog 2.0 -2.6

Safety Margin
Cat
Frog (Correct Class) Car
Cat 3.2
Car 5.1 -1.7 0 1 3.2 5.1
+2.9
Frog -1.7 -3.9

Cat 2.2
Safety Margin
Frog
(Correct Class) Cat Car
2.2 2.5
¿max⁡(0,2. −(−3.1)+1)+max⁡(0,2.5−(−3.1)+1)=max⁡(0,6.3)+max⁡(0,6. )=6.3+6. =12.9
1 -3.1 0
Car 2.5 +6.3 𝑁
1
Frog -3.1 +6.6 𝐿=
𝑁
∑ 𝐿𝑖 =¿
𝑖 =1
Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class

Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧  if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖

𝑠 𝑗 − 𝑠 𝑦 +1∧  otherwise
𝑖

Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1

𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖

Square Hinge Loss


𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1

Data loss: Model predictions Regularization: Prevent the model


should match training data from doing too well on training data
Regularization
y
𝒇𝟐
Occam’s Razar: Among multiple
competing hypotheses, the
simplest is the best
𝒇𝟏 William of Ockham 1285-1347

Regularization pushes against fitting the data


too well so we don’t fit noise in the data
Slide Credit : Stanford CS231n
Regularization
Regularization
1
𝑁
𝝀 = regularization strength (hyperparameter)
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1

Data loss: Model predictions Regularization: Prevent the model


should match training data from doing too well on training data

Simple examples: More Complex:


L2 regularization: R(W) = Dropout
L1 regularization: R(W) = Batch normalization
Stochastic Depth, Fractional pooling, etc..
Slide Credit : Stanford CS231n
Regularization: Expressing Preferences
L1 regularization:

L1 regularization leads to sparsity

L2 regularization:

𝑇 𝑇 L2 regularization likes to spread


𝑤 𝑥=𝑤 𝑥=1
1 2
out the weights
Slide Credit : Stanford CS231n
Regularization: Expressing Preferences
L1 regularization: L2 regularization:

𝑤2 𝑤2
𝐻0 𝐻0

𝑤1 𝑤1

𝑊 =[1 , 0] 𝑊= √
[
3 1
,
2 2 ]

You might also like