1
Recall …
Image hosted by https://knowledgeone.ca/8-types-of-memory-to-remember/ 2
Log and exp functions
• Log • Exp
• Two rules:
CMPS 497/CMPE 471 Special Topics in Deep Learning 3
Loss function
• Loss function or cost function measures how bad model is.
• Given/using training dataset of I pairs of input/output examples:
• Loss function:
or for short: Returns a scalar that is smaller
when model maps inputs to
outputs better
CMPS 497/CMPE 471 Special Topics in Deep Learning
Training
• Given the Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Find the parameters that minimize the loss
CMPS 497/CMPE 471 Special Topics in Deep Learning
Example: 1D Linear regression loss function
Loss function:
“Least squares loss function”
CMPS 497/CMPE 471 Special Topics in Deep Learning 6
Figures from http://udlbook.com
Why exactly “Least Squares” loss
for regression problems?
How can we “construct” loss functions
for other types of learning problems?
CMPS 497/CMPE 471 Special Topics in Deep Learning 7
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 8
How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution
over the possible values of the output y given input x.
• Loss function aims to make each observed training output to have
high probability under
CMPS 497/CMPE 471 Special Topics in Deep Learning 9
Real-valued
output
Regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 10
Figures from http://udlbook.com
Real-valued
output
Regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 11
Figures from http://udlbook.com
Real-valued
output
Regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 12
Figures from http://udlbook.com
Discrete
output
Binary
Classification
CMPS 497/CMPE 471 Special Topics in Deep Learning 13
Figures from http://udlbook.com
Discrete
output
Binary
Classification
CMPS 497/CMPE 471 Special Topics in Deep Learning 14
Figures from http://udlbook.com
Discrete
output
Binary
Classification
1
CMPS 497/CMPE 471 Special Topics in Deep Learning 15
Figures from http://udlbook.com
Discrete
output
Multiclass
Classification
CMPS 497/CMPE 471 Special Topics in Deep Learning 16
Figures from http://udlbook.com
Discrete
output
Multiclass
Classification
CMPS 497/CMPE 471 Special Topics in Deep Learning 17
Figures from http://udlbook.com
Discrete
output
Multiclass
Classification
CMPS 497/CMPE 471 Special Topics in Deep Learning 18
Figures from http://udlbook.com
19
Figures from http://udlbook.com
How can a model predict
a probabilty distribution?
CMPS 497/CMPE 471 Special Topics in Deep Learning 20
How can a model predict a prob. dist.?
1. Pick a known parametric distribution (e.g., normal distribution) to
model output y with parameters
e.g., the normal distribution
2. Use the model to predict parameters of that probability distribution
CMPS 497/CMPE 471 Special Topics in Deep Learning 21
Maximum likelihood criterion
Each observed training output should have high probability
under its corresponding distribution
Maximum likelihood
criterion
When we consider this probability as a function of the parameters , we call
it a likelihood. 22
i.i.d assumption
1. Data are identically distributed (the form of the probability
distribution over the outputs yi is the same for each data point).
2. Conditional distributions Pr(yi|xi) of the output given the input are
independent.
Data are independent and identically distributed (i.i.d.)
CMPS 497/CMPE 471 Special Topics in Deep Learning 23
Problem
• The terms in this product might all be small.
• The product might get so small that we can’t easily represent it.
CMPS 497/CMPE 471 Special Topics in Deep Learning 24
The log function is monotonic
The maximum of the logarithm of a function
is in the same place as the maximum of the function
CMPS 497/CMPE 471 Special Topics in Deep Learning 25
Figures from http://udlbook.com
Maximum log likelihood
Now it’s a sum of terms, so doesn’t matter so much if the terms are small .
CMPS 497/CMPE 471 Special Topics in Deep Learning 26
Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)
The Loss
Function
CMPS 497/CMPE 471 Special Topics in Deep Learning 27
Inference?
• But now we predict a probability distribution!
• We need an actual prediction (point estimate) …
• Find the peak of the probability distribution (i.e., mean for normal)
• Example
CMPS 497/CMPE 471 Special Topics in Deep Learning 28
CMPS 497/CMPE 471 Special Topics in Deep Learning 29
To construct a loss function, we set the
model to predict ......................
➢ the input
➢ the output
➢ a probability distribution
➢ the parameters of a probability distribution
30
In training a model, we ………….
➢ maximize the likelihood probability
➢ minimize the likelihood probability
➢ maximize the log-likelihood probability
➢ minimize the log-likelihood probability
➢ maximize the negative log-likelihood probability
➢ minimize the negative log-likelihood probability
31
Recipe
for loss functions
32
Recipe for loss functions
CMPS 497/CMPE 471 Special Topics in Deep Learning 33
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 34
Example 1: univariate regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 35
Figures from http://udlbook.com
1. Choose a prob. dist. over output domain
• Predict scalar output:
• Sensible probability distribution:
• Normal distribution
CMPS 497/CMPE 471 Special Topics in Deep Learning 36
Figures from http://udlbook.com
2. Set the model to predict dist. param.
CMPS 497/CMPE 471 Special Topics in Deep Learning 37
3. Loss fn: Negative log-likelihood
CMPS 497/CMPE 471 Special Topics in Deep Learning 38
Least
squares!
CMPS 497/CMPE 471 Special Topics in Deep Learning 39
Least squares Maximum likelihood
CMPS 497/CMPE 471 Special Topics in Deep Learning 40
Figures from http://udlbook.com
Least squares Maximum likelihood
CMPS 497/CMPE 471 Special Topics in Deep Learning 41
Figures from http://udlbook.com
4. Inference
CMPS 497/CMPE 471 Special Topics in Deep Learning 42
CMPS 497/CMPE 471 Special Topics in Deep Learning 43
Among the steps of constructing a loss function …
➢ Choose a suitable model for the problem
➢ Choose a suitable prob. dist. for the problem
➢ Set the model to predict its parameters
➢ Set the model to predict the parameters of a prob. dist.
Why do we need a loss function?
➢ To predict the parameters of the probability distribution
➢ To learn the parameters of the probability distribution
➢ To predict the parameters of the model
➢ To learn the parameters of the model
➢ To predict the output of the model given an input 44
Estimating variance
• Perhaps surprisingly, the variance term disappeared:
• But we could learn it:
• The model predicts the mean μ from the input, and the variance 𝜎 2 is learned
during the training process.
CMPS 497/CMPE 471 Special Topics in Deep Learning 45
Homoscedastic regression
• Assume that 𝜎 2 is the same everywhere.
CMPS 497/CMPE 471 Special Topics in Deep Learning 46
Figures from http://udlbook.com
Heteroscedastic regression
• Uncertainty of the model varies with input.
• Build a model with two outputs:
Why?
CMPS 497/CMPE 471 Special Topics in Deep Learning 47
Figures from http://udlbook.com
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 48
Example 2: binary classification
• Goal: predict which of two classes the input x belongs to
CMPS 497/CMPE 471 Special Topics in Deep Learning 49
Figures from http://udlbook.com
1. Choose a prob. dist. over output domain
• Domain:
• Bernoulli distribution
• One parameter 𝜆 ∈[0,1]
CMPS 497/CMPE 471 Special Topics in Deep Learning 50
2. Set the model to predict dist. param.
Parameter 𝜆 ∈ [0,1]
• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps
“anything” to [0,1]
Sigmoid:
Sigmoid activation
function
CMPS 497/CMPE 471 Special Topics in Deep Learning 51
Effect of adding Sigmoid
CMPS 497/CMPE 471 Special Topics in Deep Learning 52
3. Loss fn: Negative log-likelihood
Binary cross-entropy loss
CMPS 497/CMPE 471 Special Topics in Deep Learning 53
4. Inference
• Choose y=1 where 𝜆 > 0.5, y=0 otherwise.
CMPS 497/CMPE 471 Special Topics in Deep Learning 54
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 55
Example 3: multiclass classification
Goal: predict which of K classes the input x belongs to
CMPS 497/CMPE 471 Special Topics in Deep Learning 56
Figures from http://udlbook.com
1. Choose a prob. dist. over output domain
• Domain:
• Categorical distribution
• K parameters 𝜆𝑘 ∈ [0,1]
• Sum of all parameters = 1
CMPS 497/CMPE 471 Special Topics in Deep Learning 57
2. Set the model to predict dist. param.
Parameters 𝜆𝑘 ∈ [0,1], sum to one
• BUT: Output of neural network can be anything!
• Solution: Pass through a function that maps “anything” to [0,1], sum to 1
Softmax:
CMPS 497/CMPE 471 Special Topics in Deep Learning 58
Effect of adding Softmax
1.0
Softmax activation function/layer 0
1 2 3
CMPS 497/CMPE 471 Special Topics in Deep Learning 59
Figures from http://udlbook.com
3. Loss fn: Negative log-likelihood
Multiclass cross-entropy loss
CMPS 497/CMPE 471 Special Topics in Deep Learning 60
Figures from http://udlbook.com
4. Inference
Choose the class with the largest probability
CMPS 497/CMPE 471 Special Topics in Deep Learning 61
Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 62
For the object detection problem of 5 classes, for a
given input, the network outputs ….
➢ one number, which is the probability of the correct class
➢ 5 numbers, which are the probabilities of all classes
➢ one number, which is the softmax value of the correct class
➢ 5 numbers, which are the softmax values of all classes
For the object detection problem of 5 classes (A, B, C, D, E),
for a given input, the network direct output was 1, 2, -1, 3,
and -2 for the classes respectively. Assuming the correct
output is C, what is the loss of that input example?
63
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 64
Other
data types
CMPS 497/CMPE 471 Special Topics in Deep Learning 65
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
CMPS 497/CMPE 471 Special Topics in Deep Learning 66
Example 4: multivariate regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 67
Figures from http://udlbook.com
Multiple outputs
• Treat each output as independent:
• Negative log likelihood becomes sum of terms:
CMPS 497/CMPE 471 Special Topics in Deep Learning 68
Example 4: multivariate regression
CMPS 497/CMPE 471 Special Topics in Deep Learning 69
Figures from http://udlbook.com
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution: treat each dimension independently
• Make network with 𝐷𝑜 outputs to predict means
CMPS 497/CMPE 471 Special Topics in Deep Learning 70
Different output magnitudes?
• What if the outputs vary in magnitude
• e.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others
Why is that a problem?
• Could learn a separate variance for each.
• …or rescale before training, and then rescale output in opposite way
CMPS 497/CMPE 471 Special Topics in Deep Learning 71
Next up
• We have models with parameters!
• We have loss functions!
• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model
CMPS 497/CMPE 471 Special Topics in Deep Learning 72