0% found this document useful (0 votes)

9 views126 pages

CHC 351 Module 4

The document discusses constructing loss functions for various types of machine learning tasks, including regression, binary classification, and multiclass classification. It emphasizes the importance of loss functions in measuring model performance and introduces concepts like maximum likelihood and cross-entropy. Additionally, it covers techniques for predicting probability distributions and the implications of different data types on loss function construction.

Uploaded by

anujs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views126 pages

CHC 351 Module 4

Uploaded by

anujs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module 4

Constructing Loss Functions

Log and exp functions
• Log • Exp

• Two rules:
The log function is monotonic

Maximum of the logarithm of a function is in the same place as maximum of function

Regression

• Univariate regression problem (one output, real value)

• Fully connected network
Graph regression

• Multivariate regression problem (>1 output, real value)

• Graph neural network
Text classification

• Binary classification problem (two discrete classes)

• Transformer network
Music genre classification

• Multiclass classification problem (discrete classes, >2 possible values)

• Convolutional network
Training dataset of I pairs of input/output examples
Binary Classification Task (Training Data)
Multi-class Classification Task (Training Data)
Till now, model was a line, predicting exact value of y for a given x
Issue: Generalization
1) The model (applicability) and
2) The loss function for different data types
Shift perspective and consider the model
Resolution: as computing conditional probability
distribution.
Probabilties
0

1
0

1
Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Training
• Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

• Find the parameters that minimize the loss:

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

Example: 1D Linear regression training

This technique is known as gradient descent

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution:

over outputs y given inputs x.

• Loss function aims to make the outputs have high probability
How can a model predict a probability
distribution?
1. Pick a known distribution (e.g., normal distribution) to model output y
with parameters
e.g., the normal distribution

2. Use model to predict parameters of probability distribution

Probability
Distributions
Example:
Hyperbolic
Distribution
Combined
Probabilty
Two Assumptions (i.i.d.)
Here we are implicitly making two assumptions. First, we
assume that the data are identically distributed (the form of
the probability distribution over the outputs yi is the same for
each data point).Second, we assume that the conditional
distributions Pr(yi|xi) of the output given the input are
independent, so the total likelihood of the training data
decomposes as:
Nueral Network Distribution Parameters Maximum Likelihood
Maximum likelihood criterion

When we consider this probability as a function of the parameters , we call

it a likelihood.
• The terms in this product might all be small
Problem: • The product might get so small that we can’t easily
represent it
The log function is monotonic

Maximum of the logarithm of a function is in the same place as maximum of function

Maximum log likelihood

Now it’s a sum of terms, so doesn’t matter so much if the terms are small
Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)
Inference
• But now we predict a probability distribution
• We need an actual prediction (point estimate)
• Find the peak of the probability distribution (i.e., mean for normal)
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 1: univariate regression
Example 1: univariate regression

• Predict scalar output:

• Sensible probability distribution:
• Normal distribution
Example 1: univariate regression

• Predict scalar output:

• Sensible probability distribution:
• Normal distribution
Example 1: univariate regression
Example 1: univariate regression
Example 1: univariate regression
Example 1: univariate regression

Least squares!
Least squares Maximum likelihood
Least squares Maximum likelihood
Example 1: univariate regression
Estimating variance
• Perhaps surprisingly, the variance term disappeared:

• But we could learn it:

Heteroscedastic regression
• Assume that the noise 𝜎 2 is the same everywhere.
• But we could make the noise a function of the data x.
• Build a model with two outputs:
Heteroscedastic regression
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 2: binary classification

• Goal: predict which of two classes the input x belongs to

Example 2: binary classification

• Domain:
• Bernoulli distribution
• One parameter 𝜆 ∈[0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter 𝜆 ∈[0,1]

Solution:
• Pass through function that maps “anything
to [0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter 𝜆 ∈[0,1]

Solution:
• Pass through logistic sigmoid function that
maps “anything to [0,1]:
Example 2: binary classification
Example 2: binary classification
Example 2: binary classification

Binary cross-entropy loss

Example 2: binary classification

Choose y=1 where 𝜆 is greater than 0.5, otherwise 1

Goal: predict which of K classes the input x belongs to

Example 3: multiclass classification

• Domain:
• Categorical distribution
• K parameters 𝜆𝑘 ∈[0,1]
• Sum of all parameters = 1
Example 3: multiclass classification

Problem:
• Output of neural network can be anything
• Parameters 𝜆𝑘 ∈[0,1], sum to one

Solution:
• Pass through function that maps
“anything” to [0,1], sum to one
Example 3: multiclass classification
Example 3: multiclass classification

Multiclass cross-entropy loss

Example 3: multiclass classification

1.0

Choose the class with the largest probability 0

1 2 3
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Other data types
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Multiple outputs
• Treat each output as independent:

• Negative log likelihood becomes sum of terms:

Example 4: multivariate regression
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution treat each dimension independently

• Make network with 𝐷𝑜 outputs to predict means

Example 4: multivariate regression
• What if the outputs vary in magnitude
• E.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others
• Could learn a separate variance for each…
• …or rescale before training, and then rescale output in opposite way
Example Loss Calculation
Poisson Distribution
Recipe for loss functions
Recipe for loss functions
Poisson distribution
Problem:
• Output of neural network can be anything
• Parameter 𝜆 must be positive

Solution:
• Pass through function that maps
“anything” to positive
Problem:
• Output of neural network can be anything
• Parameter 𝜆 must be positive

Solution:
• Pass through function that maps
“anything” to positive
14 12 10 8 6 4 2 0
14 12 10 8 6 4 2 0
14 12 10 8 6 4 2 0
Recipe for loss functions
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions

Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions

Cross Entropy
Cross Entropy

Minimum
negative log
likelihood
Dirac Delta application(Sampling Property)

The product of the two terms in the first line corresponds to pointwise multiplying
the point masses in figure a with the logarithm of the distribution in figure b.
We are left with a finite set of weighted probability masses centered on the data
points.
Cross entropy in machine learning

Minimum
negative log
likelihood

In machine learning:
Next up
• We have models with parameters!
• We have loss functions!
• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model

Machine Learning Loss Functions Guide
No ratings yet
Machine Learning Loss Functions Guide
100 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Deep Learning for Classification Techniques
No ratings yet
Deep Learning for Classification Techniques
17 pages
Module 2
No ratings yet
Module 2
55 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Understanding Loss Functions in ANN
No ratings yet
Understanding Loss Functions in ANN
74 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
Understanding Loss Functions in Neural Networks
No ratings yet
Understanding Loss Functions in Neural Networks
14 pages
Linear Models in Machine Learning
No ratings yet
Linear Models in Machine Learning
46 pages
Lect 8
No ratings yet
Lect 8
117 pages
Understanding Loss Functions in Machine Learning
No ratings yet
Understanding Loss Functions in Machine Learning
26 pages
Logistic Regression and Sigmoid Function
No ratings yet
Logistic Regression and Sigmoid Function
32 pages
Understanding Loss Functions in ML
No ratings yet
Understanding Loss Functions in ML
22 pages
Deep Learning Foundations and History
No ratings yet
Deep Learning Foundations and History
8 pages
Linear Models for Classification in ML
No ratings yet
Linear Models for Classification in ML
19 pages
NLP Neural Networks Overview and Basics
No ratings yet
NLP Neural Networks Overview and Basics
13 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
Build a 3-Layer XOR Neural Network
No ratings yet
Build a 3-Layer XOR Neural Network
14 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-09 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-09 Reference-Material-I
15 pages
Feedforward Deep Networks Overview
No ratings yet
Feedforward Deep Networks Overview
19 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
31 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
Cross Entropy Loss Explained
No ratings yet
Cross Entropy Loss Explained
21 pages
Lec 05
No ratings yet
Lec 05
46 pages
Lec 2
No ratings yet
Lec 2
43 pages
Loss Functions in Deep Learning
No ratings yet
Loss Functions in Deep Learning
72 pages
Loss Functions
No ratings yet
Loss Functions
8 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
44 pages
Introduction to Cognitive Science and Machine Learning
No ratings yet
Introduction to Cognitive Science and Machine Learning
14 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
52 pages
DL Unit-2
100% (1)
DL Unit-2
24 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Linearity vs Non-Linearity Explained
No ratings yet
Linearity vs Non-Linearity Explained
10 pages
7 TrainingNN-2
No ratings yet
7 TrainingNN-2
84 pages
Intro to Binary Classification
No ratings yet
Intro to Binary Classification
10 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
ML 01
No ratings yet
ML 01
24 pages
Deep Learning: Neural Networks Overview
No ratings yet
Deep Learning: Neural Networks Overview
37 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Intro to Machine Learning Notes
No ratings yet
Intro to Machine Learning Notes
50 pages
Evaluating ML Systems & Linear Regression
No ratings yet
Evaluating ML Systems & Linear Regression
34 pages
7.losses and Activations
No ratings yet
7.losses and Activations
79 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Lecture 220927 02
No ratings yet
Lecture 220927 02
29 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
30 pages
Machine Learning Model Fundamentals
No ratings yet
Machine Learning Model Fundamentals
13 pages
Output and Loss Functions in Deep Learning
No ratings yet
Output and Loss Functions in Deep Learning
15 pages
Overview of Machine Learning Models
No ratings yet
Overview of Machine Learning Models
52 pages
Deep Feedforward Neural Networks Guide
No ratings yet
Deep Feedforward Neural Networks Guide
97 pages
Statistical Methods in Machine Learning
No ratings yet
Statistical Methods in Machine Learning
5 pages
5 DL F24 Loss Functions
No ratings yet
5 DL F24 Loss Functions
72 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Choosing Loss Functions for Neural Networks
No ratings yet
Choosing Loss Functions for Neural Networks
29 pages
Foundations of Machine Learning Concepts
No ratings yet
Foundations of Machine Learning Concepts
41 pages
Understanding Loss Functions in Deep Learning
No ratings yet
Understanding Loss Functions in Deep Learning
9 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Introduction To Process Modeling and Simulation - Part2
No ratings yet
Introduction To Process Modeling and Simulation - Part2
24 pages
HSS-304 L5
No ratings yet
HSS-304 L5
29 pages
HSS-304 L4
No ratings yet
HSS-304 L4
22 pages
2025 - Social Psychology - DeLamater, Collett, Hitlin
100% (3)
2025 - Social Psychology - DeLamater, Collett, Hitlin
679 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
25 pages
K-8 Coding Curriculum Overview
No ratings yet
K-8 Coding Curriculum Overview
6 pages
Om Scratch
100% (1)
Om Scratch
124 pages
Machine Learning in Cybersecurity
No ratings yet
Machine Learning in Cybersecurity
15 pages
AWS Certification Paths
No ratings yet
AWS Certification Paths
1 page
1.4 Feature Selection
No ratings yet
1.4 Feature Selection
12 pages
ISC2 InfoSecurityProfessional May-June-2019 F
No ratings yet
ISC2 InfoSecurityProfessional May-June-2019 F
30 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
45 pages
Lung Cancer Prediction with Machine Learning
No ratings yet
Lung Cancer Prediction with Machine Learning
57 pages
Managing The Machines: AI Is Making Prediction Cheap, Posing New Challenges For Managers
No ratings yet
Managing The Machines: AI Is Making Prediction Cheap, Posing New Challenges For Managers
14 pages
Predicting Human Decisions with ML
No ratings yet
Predicting Human Decisions with ML
50 pages
Midterm Exam Results for AI Course
No ratings yet
Midterm Exam Results for AI Course
11 pages
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
No ratings yet
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
41 pages
Cryptocurrency Price Prediction Using Deep Learning
No ratings yet
Cryptocurrency Price Prediction Using Deep Learning
52 pages
Week 4 Lecture AI Ethics 2025
No ratings yet
Week 4 Lecture AI Ethics 2025
29 pages
AI's Impact on Finance Evolution
No ratings yet
AI's Impact on Finance Evolution
37 pages
Optimizing Fraudulent Firm Prediction Using Ensemble Machine Learning A Case Study of An External Audit
No ratings yet
Optimizing Fraudulent Firm Prediction Using Ensemble Machine Learning A Case Study of An External Audit
12 pages
AI in Cybersecurity - Report PDF
100% (2)
AI in Cybersecurity - Report PDF
28 pages
Class X AI Pre-Board Exam 2024-25
No ratings yet
Class X AI Pre-Board Exam 2024-25
7 pages
Efficient Resource Utilization in Kubern
No ratings yet
Efficient Resource Utilization in Kubern
5 pages
Grouping First, Attending Smartly: Training-Free Acceleration For Diffusion Transformers (505.14682v1)
No ratings yet
Grouping First, Attending Smartly: Training-Free Acceleration For Diffusion Transformers (505.14682v1)
22 pages
Using Ai & Machine Learning To Drive Commercial Success in The Eu
No ratings yet
Using Ai & Machine Learning To Drive Commercial Success in The Eu
16 pages
Machine Learning Internship Guide
No ratings yet
Machine Learning Internship Guide
11 pages
Chau Nguyen Resume Analytics Engineer
No ratings yet
Chau Nguyen Resume Analytics Engineer
2 pages
Group 7
No ratings yet
Group 7
25 pages
Sundar 2020 trích 2 lần
No ratings yet
Sundar 2020 trích 2 lần
15 pages
Biomedical Engineering Curriculum 2021
No ratings yet
Biomedical Engineering Curriculum 2021
5 pages
Self-Supervised AV Speech Recognition
No ratings yet
Self-Supervised AV Speech Recognition
7 pages
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
Implementing the FIND-S Algorithm in Python
No ratings yet
Implementing the FIND-S Algorithm in Python
3 pages

CHC 351 Module 4

Uploaded by

CHC 351 Module 4

Uploaded by

Module 4

Constructing Loss Functions

Maximum of the logarithm of a function is in the same place as maximum of function

• Univariate regression problem (one output, real value)

• Multivariate regression problem (>1 output, real value)

• Binary classification problem (two discrete classes)

• Multiclass classification problem (discrete classes, >2 possible values)

• Loss function or cost function measures how bad model is:

• Find the parameters that minimize the loss:

“Least squares loss function”

This technique is known as gradient descent

over outputs y given inputs x.

2. Use model to predict parameters of probability distribution

When we consider this probability as a function of the parameters , we call

Maximum of the logarithm of a function is in the same place as maximum of function

• Predict scalar output:

• Predict scalar output:

• But we could learn it:

• Goal: predict which of two classes the input x belongs to

*Binary cross-entropy loss*

Choose y=1 where 𝜆 is greater than 0.5, otherwise 1

Goal: predict which of K classes the input x belongs to

*Multiclass cross-entropy loss*

Choose the class with the largest probability 0

• Negative log likelihood becomes sum of terms:

• Make network with 𝐷𝑜 outputs to predict means

Kullback-Leibler Divergence -- a measure between probability distributions

Kullback-Leibler Divergence -- a measure between probability distributions

You might also like

Binary cross-entropy loss

Multiclass cross-entropy loss