Open navigation menu

Scribd

0% found this document useful (0 votes)

26 views72 pages

Loss Functions in Deep Learning

The document discusses loss functions in deep learning, explaining their role in measuring model performance and guiding parameter optimization. It covers various types of loss functions, including those for regression, binary classification, and multiclass classification, emphasizing the use of maximum likelihood and negative log-likelihood. Additionally, it addresses the construction of loss functions and the importance of probability distributions in model predictions.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views72 pages

Loss Functions in Deep Learning

The document discusses loss functions in deep learning, explaining their role in measuring model performance and guiding parameter optimization. It covers various types of loss functions, including those for regression, binary classification, and multiclass classification, emphasizing the use of maximum likelihood and negative log-likelihood. Additionally, it addresses the construction of loss functions and the importance of probability distributions in model predictions.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Recall …

Image hosted by https://knowledgeone.ca/8-types-of-memory-to-remember/ 2

Log and exp functions
• Log • Exp

• Two rules:

CMPS 497/CMPE 471 Special Topics in Deep Learning 3

Loss function
• Loss function or cost function measures how bad model is.
• Given/using training dataset of I pairs of input/output examples:

• Loss function:

or for short: Returns a scalar that is smaller

when model maps inputs to
outputs better

CMPS 497/CMPE 471 Special Topics in Deep Learning

Training
• Given the Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

Find the parameters that minimize the loss

CMPS 497/CMPE 471 Special Topics in Deep Learning

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

CMPS 497/CMPE 471 Special Topics in Deep Learning 6
Figures from http://udlbook.com
Why exactly “Least Squares” loss
for regression problems?

How can we “construct” loss functions

for other types of learning problems?

CMPS 497/CMPE 471 Special Topics in Deep Learning 7

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 8

How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution

over the possible values of the output y given input x.

• Loss function aims to make each observed training output to have
high probability under

CMPS 497/CMPE 471 Special Topics in Deep Learning 9

Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 10

Figures from http://udlbook.com
Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 11

Figures from http://udlbook.com
Real-valued
output
Regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 12

Figures from http://udlbook.com
Discrete
output

Binary
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 13

Figures from http://udlbook.com
Discrete
output

Binary
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 14

Figures from http://udlbook.com
Discrete
output

Binary
Classification

1
CMPS 497/CMPE 471 Special Topics in Deep Learning 15
Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 16

Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 17

Figures from http://udlbook.com
Discrete
output

Multiclass
Classification

CMPS 497/CMPE 471 Special Topics in Deep Learning 18

Figures from http://udlbook.com
19
Figures from http://udlbook.com
How can a model predict
a probabilty distribution?

CMPS 497/CMPE 471 Special Topics in Deep Learning 20

How can a model predict a prob. dist.?
1. Pick a known parametric distribution (e.g., normal distribution) to
model output y with parameters
e.g., the normal distribution

2. Use the model to predict parameters of that probability distribution

CMPS 497/CMPE 471 Special Topics in Deep Learning 21
Maximum likelihood criterion
Each observed training output should have high probability
under its corresponding distribution

Maximum likelihood
criterion
When we consider this probability as a function of the parameters , we call
it a likelihood. 22
i.i.d assumption
1. Data are identically distributed (the form of the probability
distribution over the outputs yi is the same for each data point).
2. Conditional distributions Pr(yi|xi) of the output given the input are
independent.

Data are independent and identically distributed (i.i.d.)

CMPS 497/CMPE 471 Special Topics in Deep Learning 23

Problem

• The terms in this product might all be small.

• The product might get so small that we can’t easily represent it.

CMPS 497/CMPE 471 Special Topics in Deep Learning 24

The log function is monotonic

The maximum of the logarithm of a function

is in the same place as the maximum of the function

CMPS 497/CMPE 471 Special Topics in Deep Learning 25

Figures from http://udlbook.com
Maximum log likelihood

Now it’s a sum of terms, so doesn’t matter so much if the terms are small .

CMPS 497/CMPE 471 Special Topics in Deep Learning 26

Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)

The Loss
Function

CMPS 497/CMPE 471 Special Topics in Deep Learning 27

Inference?
• But now we predict a probability distribution!
• We need an actual prediction (point estimate) …
• Find the peak of the probability distribution (i.e., mean for normal)

• Example

CMPS 497/CMPE 471 Special Topics in Deep Learning 28

CMPS 497/CMPE 471 Special Topics in Deep Learning 29
To construct a loss function, we set the
model to predict ......................
➢ the input
➢ the output
➢ a probability distribution
➢ the parameters of a probability distribution

30
In training a model, we ………….
➢ maximize the likelihood probability
➢ minimize the likelihood probability
➢ maximize the log-likelihood probability
➢ minimize the log-likelihood probability
➢ maximize the negative log-likelihood probability
➢ minimize the negative log-likelihood probability

31
Recipe
for loss functions

32
Recipe for loss functions

CMPS 497/CMPE 471 Special Topics in Deep Learning 33

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 34

Example 1: univariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 35

Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Predict scalar output:

• Sensible probability distribution:
• Normal distribution

CMPS 497/CMPE 471 Special Topics in Deep Learning 36

Figures from http://udlbook.com
2. Set the model to predict dist. param.

CMPS 497/CMPE 471 Special Topics in Deep Learning 37

3. Loss fn: Negative log-likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 38

Least
squares!
CMPS 497/CMPE 471 Special Topics in Deep Learning 39
Least squares Maximum likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 40

Figures from http://udlbook.com
Least squares Maximum likelihood

CMPS 497/CMPE 471 Special Topics in Deep Learning 41

Figures from http://udlbook.com
4. Inference

CMPS 497/CMPE 471 Special Topics in Deep Learning 42

CMPS 497/CMPE 471 Special Topics in Deep Learning 43
Among the steps of constructing a loss function …
➢ Choose a suitable model for the problem
➢ Choose a suitable prob. dist. for the problem
➢ Set the model to predict its parameters
➢ Set the model to predict the parameters of a prob. dist.

Why do we need a loss function?

➢ To predict the parameters of the probability distribution
➢ To learn the parameters of the probability distribution
➢ To predict the parameters of the model
➢ To learn the parameters of the model
➢ To predict the output of the model given an input 44
Estimating variance
• Perhaps surprisingly, the variance term disappeared:

• But we could learn it:

• The model predicts the mean μ from the input, and the variance 𝜎 2 is learned
during the training process.
CMPS 497/CMPE 471 Special Topics in Deep Learning 45
Homoscedastic regression
• Assume that 𝜎 2 is the same everywhere.

CMPS 497/CMPE 471 Special Topics in Deep Learning 46

Figures from http://udlbook.com
Heteroscedastic regression
• Uncertainty of the model varies with input.
• Build a model with two outputs:
Why?

CMPS 497/CMPE 471 Special Topics in Deep Learning 47

Figures from http://udlbook.com
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 48

Example 2: binary classification

• Goal: predict which of two classes the input x belongs to

CMPS 497/CMPE 471 Special Topics in Deep Learning 49

Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Domain:
• Bernoulli distribution
• One parameter 𝜆 ∈[0,1]

CMPS 497/CMPE 471 Special Topics in Deep Learning 50

2. Set the model to predict dist. param.

Parameter 𝜆 ∈ [0,1]
• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps
“anything” to [0,1]
Sigmoid:

Sigmoid activation
function
CMPS 497/CMPE 471 Special Topics in Deep Learning 51
Effect of adding Sigmoid

CMPS 497/CMPE 471 Special Topics in Deep Learning 52

3. Loss fn: Negative log-likelihood

Binary cross-entropy loss

CMPS 497/CMPE 471 Special Topics in Deep Learning 53

4. Inference

• Choose y=1 where 𝜆 > 0.5, y=0 otherwise.

CMPS 497/CMPE 471 Special Topics in Deep Learning 54

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 55

Example 3: multiclass classification

Goal: predict which of K classes the input x belongs to

CMPS 497/CMPE 471 Special Topics in Deep Learning 56

Figures from http://udlbook.com
1. Choose a prob. dist. over output domain

• Domain:
• Categorical distribution
• K parameters 𝜆𝑘 ∈ [0,1]
• Sum of all parameters = 1

CMPS 497/CMPE 471 Special Topics in Deep Learning 57

2. Set the model to predict dist. param.

Parameters 𝜆𝑘 ∈ [0,1], sum to one

• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps “anything” to [0,1], sum to 1
Softmax:

CMPS 497/CMPE 471 Special Topics in Deep Learning 58

Effect of adding Softmax

1.0

Softmax activation function/layer 0

1 2 3

CMPS 497/CMPE 471 Special Topics in Deep Learning 59

Figures from http://udlbook.com
3. Loss fn: Negative log-likelihood

Multiclass cross-entropy loss

CMPS 497/CMPE 471 Special Topics in Deep Learning 60

Figures from http://udlbook.com
4. Inference

Choose the class with the largest probability

CMPS 497/CMPE 471 Special Topics in Deep Learning 61

Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 62
For the object detection problem of 5 classes, for a
given input, the network outputs ….
➢ one number, which is the probability of the correct class
➢ 5 numbers, which are the probabilities of all classes
➢ one number, which is the softmax value of the correct class
➢ 5 numbers, which are the softmax values of all classes

For the object detection problem of 5 classes (A, B, C, D, E),

for a given input, the network direct output was 1, 2, -1, 3,
and -2 for the classes respectively. Assuming the correct
output is C, what is the loss of that input example?
63
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 64

Other
data types

CMPS 497/CMPE 471 Special Topics in Deep Learning 65

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 66

Example 4: multivariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 67

Figures from http://udlbook.com
Multiple outputs
• Treat each output as independent:

• Negative log likelihood becomes sum of terms:

CMPS 497/CMPE 471 Special Topics in Deep Learning 68

Example 4: multivariate regression

CMPS 497/CMPE 471 Special Topics in Deep Learning 69

Figures from http://udlbook.com
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution: treat each dimension independently

• Make network with 𝐷𝑜 outputs to predict means

CMPS 497/CMPE 471 Special Topics in Deep Learning 70

Different output magnitudes?
• What if the outputs vary in magnitude
• e.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others

Why is that a problem?

• Could learn a separate variance for each.

• …or rescale before training, and then rescale output in opposite way

CMPS 497/CMPE 471 Special Topics in Deep Learning 71

Next up
• We have models with parameters!
• We have loss functions!

• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model

CMPS 497/CMPE 471 Special Topics in Deep Learning 72

You might also like

5 DL F24 Loss Functions
No ratings yet
5 DL F24 Loss Functions
72 pages
2 DL F24 Supervised Learning
No ratings yet
2 DL F24 Supervised Learning
45 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
51 pages
1.b-DL-F24-Supervised Learning
No ratings yet
1.b-DL-F24-Supervised Learning
44 pages
6 DL F24 Fitting
No ratings yet
6 DL F24 Fitting
49 pages
4 DL F24 Deep Neural Nets
No ratings yet
4 DL F24 Deep Neural Nets
43 pages
3 DL F24 Shallow Neural Nets
No ratings yet
3 DL F24 Shallow Neural Nets
58 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
8 DL F24 Measuring Performance
No ratings yet
8 DL F24 Measuring Performance
42 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
CHC 351 Module 4
No ratings yet
CHC 351 Module 4
126 pages
4 DL Deep Neural Nets
No ratings yet
4 DL Deep Neural Nets
56 pages
Machine Learning Loss Functions Guide
No ratings yet
Machine Learning Loss Functions Guide
100 pages
Deep Learning: Neural Networks Overview
No ratings yet
Deep Learning: Neural Networks Overview
37 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Deep - Learning
No ratings yet
Deep - Learning
49 pages
Feedforward Deep Networks Overview
No ratings yet
Feedforward Deep Networks Overview
19 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
46 pages
Understanding Loss Functions in ANN
No ratings yet
Understanding Loss Functions in ANN
74 pages
Deep Learning Concepts and Algorithms
No ratings yet
Deep Learning Concepts and Algorithms
55 pages
Module 2
No ratings yet
Module 2
55 pages
Linear Models for Classification in ML
No ratings yet
Linear Models for Classification in ML
19 pages
Advanced ML Slides Intro
No ratings yet
Advanced ML Slides Intro
14 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Intro to Neural Networks Lecture
No ratings yet
Intro to Neural Networks Lecture
65 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Deep Feedforward Neural Networks Guide
No ratings yet
Deep Feedforward Neural Networks Guide
97 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
47 pages
Lec 05
No ratings yet
Lec 05
46 pages
Deep Learning: MLP and Backpropagation
No ratings yet
Deep Learning: MLP and Backpropagation
36 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
9 DL F24 Regularization
No ratings yet
9 DL F24 Regularization
29 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Lec 2
No ratings yet
Lec 2
43 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
NLP Neural Networks Overview and Basics
No ratings yet
NLP Neural Networks Overview and Basics
13 pages
Neural Networks in Machine Learning
No ratings yet
Neural Networks in Machine Learning
77 pages
Understanding Loss Functions in Deep Learning
No ratings yet
Understanding Loss Functions in Deep Learning
14 pages
Formalizing Machine Learning Problems
No ratings yet
Formalizing Machine Learning Problems
9 pages
Deep Learning and Neural Networks Overview
No ratings yet
Deep Learning and Neural Networks Overview
9 pages
Shallow Neural Networks Syllabus
No ratings yet
Shallow Neural Networks Syllabus
48 pages
Lecture 220927 02
No ratings yet
Lecture 220927 02
29 pages
Build a 3-Layer XOR Neural Network
No ratings yet
Build a 3-Layer XOR Neural Network
14 pages
Deep Learning Foundations and History
No ratings yet
Deep Learning Foundations and History
8 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
5 LogRegNN
No ratings yet
5 LogRegNN
74 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
78 pages
Understanding Probabilistic ML Techniques
No ratings yet
Understanding Probabilistic ML Techniques
21 pages
Unit 1
No ratings yet
Unit 1
92 pages
7.losses and Activations
No ratings yet
7.losses and Activations
79 pages
ML 01
No ratings yet
ML 01
24 pages
GENERAL MATHEMATICS Answer Key
No ratings yet
GENERAL MATHEMATICS Answer Key
13 pages
Solving Logarithmic Equations
100% (1)
Solving Logarithmic Equations
4 pages
Exponential and Logarithmic Equations Guide
No ratings yet
Exponential and Logarithmic Equations Guide
3 pages
Calculus Applications and Technology With CD ROM Available Titles Cengagenow PDF
No ratings yet
Calculus Applications and Technology With CD ROM Available Titles Cengagenow PDF
769 pages
Limits of Exponential & Logarithmic Functions
No ratings yet
Limits of Exponential & Logarithmic Functions
2 pages
Maths S4&S5-1
No ratings yet
Maths S4&S5-1
24 pages
Inverse Functions: Grade 12 Mathematics
No ratings yet
Inverse Functions: Grade 12 Mathematics
8 pages
Advanced Query Functions Guide
No ratings yet
Advanced Query Functions Guide
21 pages
Lelm 501
No ratings yet
Lelm 501
28 pages
Impossibility Theorems For Integrals
No ratings yet
Impossibility Theorems For Integrals
13 pages
Concepts in Calculus I - Beta Version - 2011 - Miklos Bona - Sergei Shabanov
100% (1)
Concepts in Calculus I - Beta Version - 2011 - Miklos Bona - Sergei Shabanov
188 pages
Liquid Inertia in Baffled Tanks
No ratings yet
Liquid Inertia in Baffled Tanks
3 pages
Logarithm Properties Explained
No ratings yet
Logarithm Properties Explained
3 pages
Unit 2 Evaluation of Analytical Data I
80% (5)
Unit 2 Evaluation of Analytical Data I
15 pages
Aerospace Manufacturing Cost Prediction From A Mea
No ratings yet
Aerospace Manufacturing Cost Prediction From A Mea
10 pages
Problems in Mathematical Analysis
33% (3)
Problems in Mathematical Analysis
496 pages
2015 MIT Integration Bee Qualifying Exam
No ratings yet
2015 MIT Integration Bee Qualifying Exam
2 pages
Precalc4 5
No ratings yet
Precalc4 5
8 pages
Integrated Math Exam Paper
No ratings yet
Integrated Math Exam Paper
3 pages
Understanding Transfer Functions in Systems
No ratings yet
Understanding Transfer Functions in Systems
28 pages
QLD mm12 5ws1 C
No ratings yet
QLD mm12 5ws1 C
202 pages
Logarithms Simplified by James Wilcox
No ratings yet
Logarithms Simplified by James Wilcox
7 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
27 pages
Chapter 2 Applied Maths
No ratings yet
Chapter 2 Applied Maths
27 pages
Logarithm Exercises and Solutions
No ratings yet
Logarithm Exercises and Solutions
4 pages
Screenshot 2025-03-14 at 22.31.37
No ratings yet
Screenshot 2025-03-14 at 22.31.37
2 pages
A Comparative Dissolution Test Between Generic and Branded Name of Furosemide Tablets
No ratings yet
A Comparative Dissolution Test Between Generic and Branded Name of Furosemide Tablets
5 pages
Experiment 7: Deflection of Beams (Effect of Beam Length and Width)
No ratings yet
Experiment 7: Deflection of Beams (Effect of Beam Length and Width)
7 pages
RD Sharma Maths: Solved MCQs
No ratings yet
RD Sharma Maths: Solved MCQs
149 pages
Digital Music Composition with Harmonic Scales
100% (1)
Digital Music Composition with Harmonic Scales
8 pages