0% found this document useful (0 votes)
26 views72 pages

Loss Functions in Deep Learning

The document discusses loss functions in deep learning, explaining their role in measuring model performance and guiding parameter optimization. It covers various types of loss functions, including those for regression, binary classification, and multiclass classification, emphasizing the use of maximum likelihood and negative log-likelihood. Additionally, it addresses the construction of loss functions and the importance of probability distributions in model predictions.

Uploaded by

mahfuz.karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views72 pages

Loss Functions in Deep Learning

The document discusses loss functions in deep learning, explaining their role in measuring model performance and guiding parameter optimization. It covers various types of loss functions, including those for regression, binary classification, and multiclass classification, emphasizing the use of maximum likelihood and negative log-likelihood. Additionally, it addresses the construction of loss functions and the importance of probability distributions in model predictions.

Uploaded by

mahfuz.karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Recall …

Image hosted by https://knowledgeone.ca/8-types-of-memory-to-remember/ 2


Log and exp functions
• Log • Exp

• Two rules:

CMPS 497/CMPE 471 Special Topics in Deep Learning 3


Loss function
• Loss function or cost function measures how bad model is.
• Given/using training dataset of I pairs of input/output examples:

• Loss function:

or for short: Returns a scalar that is smaller


when model maps inputs to
outputs better

CMPS 497/CMPE 471 Special Topics in Deep Learning


Training
• Given the Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

Find the parameters that minimize the loss

CMPS 497/CMPE 471 Special Topics in Deep Learning


Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”


CMPS 497/CMPE 471 Special Topics in Deep Learning 6
Figures from http://udlbook.com
Why exactly “Least Squares” loss
for regression problems?

How can we “construct” loss functions


for other types of learning problems?

CMPS 497/CMPE 471 Special Topics in Deep Learning 7


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 8


How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution

over the possible values of the output y given input x.


• Loss function aims to make each observed training output to have
high probability under

CMPS 497/CMPE 471 Special Topics in Deep Learning 9


Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 10


Figures from http://udlbook.com
Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 11


Figures from http://udlbook.com
Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 12


Figures from http://udlbook.com
Discrete
output

Binary
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 13


Figures from http://udlbook.com
Discrete
output

Binary
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 14


Figures from http://udlbook.com
Discrete
output

Binary
Classification

1
CMPS 497/CMPE 471 Special Topics in Deep Learning 15
Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 16


Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 17


Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 18


Figures from http://udlbook.com
19
Figures from http://udlbook.com
How can a model predict
a probabilty distribution?

CMPS 497/CMPE 471 Special Topics in Deep Learning 20


How can a model predict a prob. dist.?
1. Pick a known parametric distribution (e.g., normal distribution) to
model output y with parameters
e.g., the normal distribution

2. Use the model to predict parameters of that probability distribution


CMPS 497/CMPE 471 Special Topics in Deep Learning 21
Maximum likelihood criterion
Each observed training output should have high probability
under its corresponding distribution

Maximum likelihood
criterion
When we consider this probability as a function of the parameters , we call
it a likelihood. 22
i.i.d assumption
1. Data are identically distributed (the form of the probability
distribution over the outputs yi is the same for each data point).
2. Conditional distributions Pr(yi|xi) of the output given the input are
independent.

Data are independent and identically distributed (i.i.d.)

CMPS 497/CMPE 471 Special Topics in Deep Learning 23


Problem

• The terms in this product might all be small.


• The product might get so small that we can’t easily represent it.

CMPS 497/CMPE 471 Special Topics in Deep Learning 24


The log function is monotonic

The maximum of the logarithm of a function


is in the same place as the maximum of the function

CMPS 497/CMPE 471 Special Topics in Deep Learning 25


Figures from http://udlbook.com
Maximum log likelihood

Now it’s a sum of terms, so doesn’t matter so much if the terms are small .

CMPS 497/CMPE 471 Special Topics in Deep Learning 26


Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)

The Loss
Function

CMPS 497/CMPE 471 Special Topics in Deep Learning 27


Inference?
• But now we predict a probability distribution!
• We need an actual prediction (point estimate) …
• Find the peak of the probability distribution (i.e., mean for normal)

• Example

CMPS 497/CMPE 471 Special Topics in Deep Learning 28


CMPS 497/CMPE 471 Special Topics in Deep Learning 29
To construct a loss function, we set the
model to predict ......................
➢ the input
➢ the output
➢ a probability distribution
➢ the parameters of a probability distribution

30
In training a model, we ………….
➢ maximize the likelihood probability
➢ minimize the likelihood probability
➢ maximize the log-likelihood probability
➢ minimize the log-likelihood probability
➢ maximize the negative log-likelihood probability
➢ minimize the negative log-likelihood probability

31
Recipe
for loss functions

32
Recipe for loss functions

CMPS 497/CMPE 471 Special Topics in Deep Learning 33


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 34


Example 1: univariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 35


Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Predict scalar output:


• Sensible probability distribution:
• Normal distribution

CMPS 497/CMPE 471 Special Topics in Deep Learning 36


Figures from http://udlbook.com
2. Set the model to predict dist. param.

CMPS 497/CMPE 471 Special Topics in Deep Learning 37


3. Loss fn: Negative log-likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 38


Least
squares!
CMPS 497/CMPE 471 Special Topics in Deep Learning 39
Least squares Maximum likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 40


Figures from http://udlbook.com
Least squares Maximum likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 41


Figures from http://udlbook.com
4. Inference

CMPS 497/CMPE 471 Special Topics in Deep Learning 42


CMPS 497/CMPE 471 Special Topics in Deep Learning 43
Among the steps of constructing a loss function …
➢ Choose a suitable model for the problem
➢ Choose a suitable prob. dist. for the problem
➢ Set the model to predict its parameters
➢ Set the model to predict the parameters of a prob. dist.

Why do we need a loss function?


➢ To predict the parameters of the probability distribution
➢ To learn the parameters of the probability distribution
➢ To predict the parameters of the model
➢ To learn the parameters of the model
➢ To predict the output of the model given an input 44
Estimating variance
• Perhaps surprisingly, the variance term disappeared:

• But we could learn it:

• The model predicts the mean μ from the input, and the variance 𝜎 2 is learned
during the training process.
CMPS 497/CMPE 471 Special Topics in Deep Learning 45
Homoscedastic regression
• Assume that 𝜎 2 is the same everywhere.

CMPS 497/CMPE 471 Special Topics in Deep Learning 46


Figures from http://udlbook.com
Heteroscedastic regression
• Uncertainty of the model varies with input.
• Build a model with two outputs:
Why?

CMPS 497/CMPE 471 Special Topics in Deep Learning 47


Figures from http://udlbook.com
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 48


Example 2: binary classification

• Goal: predict which of two classes the input x belongs to

CMPS 497/CMPE 471 Special Topics in Deep Learning 49


Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Domain:
• Bernoulli distribution
• One parameter 𝜆 ∈[0,1]

CMPS 497/CMPE 471 Special Topics in Deep Learning 50


2. Set the model to predict dist. param.

Parameter 𝜆 ∈ [0,1]
• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps
“anything” to [0,1]
Sigmoid:

Sigmoid activation
function
CMPS 497/CMPE 471 Special Topics in Deep Learning 51
Effect of adding Sigmoid

CMPS 497/CMPE 471 Special Topics in Deep Learning 52


3. Loss fn: Negative log-likelihood

Binary cross-entropy loss

CMPS 497/CMPE 471 Special Topics in Deep Learning 53


4. Inference

• Choose y=1 where 𝜆 > 0.5, y=0 otherwise.

CMPS 497/CMPE 471 Special Topics in Deep Learning 54


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 55


Example 3: multiclass classification

Goal: predict which of K classes the input x belongs to

CMPS 497/CMPE 471 Special Topics in Deep Learning 56


Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Domain:
• Categorical distribution
• K parameters 𝜆𝑘 ∈ [0,1]
• Sum of all parameters = 1

CMPS 497/CMPE 471 Special Topics in Deep Learning 57


2. Set the model to predict dist. param.

Parameters 𝜆𝑘 ∈ [0,1], sum to one


• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps “anything” to [0,1], sum to 1
Softmax:

CMPS 497/CMPE 471 Special Topics in Deep Learning 58


Effect of adding Softmax

1.0

Softmax activation function/layer 0


1 2 3

CMPS 497/CMPE 471 Special Topics in Deep Learning 59


Figures from http://udlbook.com
3. Loss fn: Negative log-likelihood

Multiclass cross-entropy loss

CMPS 497/CMPE 471 Special Topics in Deep Learning 60


Figures from http://udlbook.com
4. Inference

Choose the class with the largest probability

CMPS 497/CMPE 471 Special Topics in Deep Learning 61


Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 62
For the object detection problem of 5 classes, for a
given input, the network outputs ….
➢ one number, which is the probability of the correct class
➢ 5 numbers, which are the probabilities of all classes
➢ one number, which is the softmax value of the correct class
➢ 5 numbers, which are the softmax values of all classes

For the object detection problem of 5 classes (A, B, C, D, E),


for a given input, the network direct output was 1, 2, -1, 3,
and -2 for the classes respectively. Assuming the correct
output is C, what is the loss of that input example?
63
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 64


Other
data types

CMPS 497/CMPE 471 Special Topics in Deep Learning 65


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 66


Example 4: multivariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 67


Figures from http://udlbook.com
Multiple outputs
• Treat each output as independent:

• Negative log likelihood becomes sum of terms:

CMPS 497/CMPE 471 Special Topics in Deep Learning 68


Example 4: multivariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 69


Figures from http://udlbook.com
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution: treat each dimension independently

• Make network with 𝐷𝑜 outputs to predict means

CMPS 497/CMPE 471 Special Topics in Deep Learning 70


Different output magnitudes?
• What if the outputs vary in magnitude
• e.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others

Why is that a problem?

• Could learn a separate variance for each.


• …or rescale before training, and then rescale output in opposite way

CMPS 497/CMPE 471 Special Topics in Deep Learning 71


Next up
• We have models with parameters!
• We have loss functions!

• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model

CMPS 497/CMPE 471 Special Topics in Deep Learning 72

You might also like