Welcome to Scribd!

Lecture01d Cost Functions

Uploaded by

0% found this document useful (0 votes)

13 views8 pages

The document summarizes key concepts about cost functions used in machine learning models. It discusses desirable properties like being symmetric around zero and penalizing large mistakes similarly. Common cost functions like mean squared error (MSE) and mean absolute error (MAE) are examined for properties like handling outliers. Convexity is also discussed as an important computational property, with MSE being a convex function. Additional reading suggestions on robust cost functions and visualizing "nasty" cost functions are provided.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

13 views8 pages

Lecture01d Cost Functions

Uploaded by

Uasdaf

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 8

Search inside document

Machine Learning Course - CS-433

Cost Functions

Sept 23, 2021

minor changes by Martin Jaggi 2021,2020,2019,2018,2017,2016;

Mohammad
c Emtiyaz Khan 2015
Last updated on: September 24, 2021
Motivation
Consider the following models.

1-parameter model: yn ≈ w0
2-parameter model: yn ≈ w0 + w1xn1

How can we estimate (or guess) val-

ues of w given the data D?

What is a cost function?

A cost function (or energy, loss,
training objective) is used to learn
parameters that explain the data
well. The cost function quantifies
how well our model does - or in other
words how costly our mistakes are.

Two desirable properties of cost functions

When the target y is real-valued,
it is often desirable that the cost
is symmetric around 0, since both
positive and negative errors should
be penalized equally.

Also, our cost function should pe-

nalize “large” mistakes and “very-
large” mistakes similarly.
Statistical vs computational trade-off
If we want better statistical proper-
ties, then we have to give-up good
computational properties.

Mean Square Error (MSE)

MSE is one of the most popular cost
functions.
N
1 X 2
MSE(w) := yn − fw (xn)
N n=1

Does this cost function have both

mentioned properties?

An exercise for MSE

Compute MSE for 1-param model:
N
1 X 2
L(w0) := y n − w0
N n=1

1 2 3 4 5 6 7
y1 = 1
y2 = 2
y3 = 3
y4 = 4
MSE(w) · N
y5 = 20
MSE(w) · N
.
Some help: 192 = 361, 182 = 324, 172 = 289, 162 = 256, 152 = 225, 142 = 196, 132 = 169
Outliers
Outliers are data examples that are
far away from most of the other
examples. Unfortunately, they
occur more often in reality than you
would want them to!

MSE is not a good cost function

when outliers are present.

Here is a real example on speed of

light measurements
(Gelman’s book on Bayesian data analysis)

(a) Original speed of light data done (b) Histogram showing out-
by Simon Newcomb. liers.

Handling outliers well is a desired

statistical property.
Mean Absolute Error (MAE)

N
1 X
MAE(w) := yn − fw (xn)
N n=1

Repeat the exercise with MAE.

1 2 3 4 5 6 7
y1 = 1
y2 = 2
y3 = 3
y4 = 4
MAE(w) · N
y5 = 20
MAE(w) · N

Can you draw MSE and MAE for

the above example?
Convexity
Roughly, a function is convex iff a
line segment between two points
on the function’s graph always lies
above the function.

A function h(u) with u ∈ RD is con-

vex, if for any u, v ∈ RD and for any
0 ≤ λ ≤ 1, we have:

h(λu + (1 − λ)v) ≤ λh(u) + (1 − λ)h(v)

A function is strictly convex if the

inequality is strict.

Importance of convexity
A strictly convex function has a
unique global minimum w?. For
convex functions, every local mini-
mum is a global minimum.

Sums of convex functions are also

convex. Therefore, MSE combined
with a linear model is convex in w.

Convexity is a desired computa-

tional property.
Can you prove that the MAE is con-
vex? (as a function of the parame-
ters w ∈ RD , for linear regression
fw (x) := f (x, w) := x>w)

Computational VS statistical trade-off

So which loss function is the best?

Figure taken from Patrick Breheny’s slides.

If we want better statistical proper-

ties, then we have to give-up good
computational properties.
Additional Reading
Other cost functions
Huber loss
1 2

2e , if |e| ≤ δ
Huber(e) := (1)
δ|e| − 12 δ 2 , if |e| > δ
Huber loss is convex, differentiable, and also robust to outliers. However,
setting δ is not an easy task.

Tukey’s bisquare loss (defined in terms of the gradient)

e{1 − e2/δ 2}2 , if |e| ≤ δ

∂L
:= (2)
∂e 0 , if |e| > δ
Tukey’s loss is non-convex, but robust to outliers.

Additional reading on outliers

• Wikipedia page on “Robust statistics”.
• Repeat the exercise with MAE.
• Sec 2.4 of Kevin Murphy’s book for an example of robust modeling

Nasty cost functions: Visualization

See Andrej Karpathy’s Tumblr post for many cost functions gone “wrong”
for neural networks. http://lossfunctions.tumblr.com/.

Fvifa Tables 2
Document2 pages
Fvifa Tables 2
Alexia
100% (3)
Machine Learning Cheat Sheet ??? - ?
Document231 pages
Machine Learning Cheat Sheet ??? - ?
Mahesh Gulla
No ratings yet
Solution Dseclzg524!01!102020 Ec2r
Document6 pages
Solution Dseclzg524!01!102020 Ec2r
srirams007
100% (1)
Exercise 03
Document5 pages
Exercise 03
gfjgg
No ratings yet
Laude 2018 Discrete Continuous
Document13 pages
Laude 2018 Discrete Continuous
swagmanswag
No ratings yet
Linear Regression
Document38 pages
Linear Regression
Habiba Sameh
No ratings yet
Optimization: Machine Learning Course - CS-433
Document23 pages
Optimization: Machine Learning Course - CS-433
Diego Iriarte Sainz
No ratings yet
Math YHPLinear Regression
Document13 pages
Math YHPLinear Regression
Emir Hurturk
No ratings yet
Kevin Swingler - Lecture 3: Delta Rule
Document10 pages
Kevin Swingler - Lecture 3: Delta Rule
Roots999
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
Document9 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
Shamil shihab pk
No ratings yet
Linear Regression: Machine Learning Course - CS-433
Document5 pages
Linear Regression: Machine Learning Course - CS-433
Diego Iriarte Sainz
No ratings yet
Chapter 4
Document10 pages
Chapter 4
Emanuel Diego
No ratings yet
HW 1
Document3 pages
HW 1
Prakhar Srivastava
No ratings yet
Lecture 1, Part 3: Training A Classifier: Roger Grosse
Document11 pages
Lecture 1, Part 3: Training A Classifier: Roger Grosse
Shamil shihab pk
No ratings yet
Homework 9 Due: March 13, 2020, 11:59PM PT
Document2 pages
Homework 9 Due: March 13, 2020, 11:59PM PT
Prateek Vasudev
No ratings yet
Icml
Document24 pages
Icml
Abhishek Bansal
No ratings yet
Final 2014 W
Document4 pages
Final 2014 W
Việt Nguyễn
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
Document97 pages
DNN - M2 - Deep Feedforward NN 23dec
Manju Prasad N
No ratings yet
Support Vector Machine Classification Algorithm and Its Application
Document8 pages
Support Vector Machine Classification Algorithm and Its Application
0191720003 ELIAS ANTONIO BELLO LEON ESTUDIANTE ACTIVO
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
Document37 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
Shanmuganathan V (RC2113003011029)
No ratings yet
DeepLearning Introduction
Document14 pages
DeepLearning Introduction
Heny Dave
No ratings yet
Learning 2
Document104 pages
Learning 2
Noel Roy Denja
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
Document38 pages
Introduction To Machine Learning Lecture 2: Linear Regression
Deepa Devaraj
No ratings yet
DNN Cluster S2 22 MidSem Regular
Document6 pages
DNN Cluster S2 22 MidSem Regular
rahulsgayke
No ratings yet
Linear Models (Unit II) Chapter III 1
Document24 pages
Linear Models (Unit II) Chapter III 1
Anil
No ratings yet
Intro
Document38 pages
Intro
Tran Kim Toai
No ratings yet
Section05 Solutions
Document9 pages
Section05 Solutions
ebi1234
No ratings yet
ML Cheatsheet
Document219 pages
ML Cheatsheet
Amit Mithun
100% (1)
Error
Document24 pages
Error
nourhan fahmy
No ratings yet
Linear Regression
Document8 pages
Linear Regression
Parthraj Solanki
No ratings yet
Lec 6 Tutorial
Document27 pages
Lec 6 Tutorial
sentry
No ratings yet
UNIT 3 Dimentionality Problems and Nonparametic Pattern Classification
Document18 pages
UNIT 3 Dimentionality Problems and Nonparametic Pattern Classification
Kalyan
No ratings yet
Lec 3
Document7 pages
Lec 3
stathiss11
No ratings yet
Solutions Problem Set 1
Document7 pages
Solutions Problem Set 1
sanketjaiswal
No ratings yet
Modelling Network Constrained Economic Dispatch Problems
Document6 pages
Modelling Network Constrained Economic Dispatch Problems
Nguyễn Hồng
No ratings yet
ECS7020P Sample Paper Solutions
Document6 pages
ECS7020P Sample Paper Solutions
Yen-Kai Cheng
No ratings yet
Nelder Mead 2D
Document5 pages
Nelder Mead 2D
Bilal Latrech
No ratings yet
IN5400 - Machine Learning For Image Analysis
Document6 pages
IN5400 - Machine Learning For Image Analysis
Johanne Saxegaard
No ratings yet
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
Document70 pages
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
Wantong Liao
No ratings yet
Module General Mathematics 3
Document8 pages
Module General Mathematics 3
sherilyn palacay
No ratings yet
Evaluating The Uncertainty of Polynomial Regression Models Using Excel
Document18 pages
Evaluating The Uncertainty of Polynomial Regression Models Using Excel
Satria Wijaya
No ratings yet
Finite Element Method: X X X F N Da FN DV X F
Document11 pages
Finite Element Method: X X X F N Da FN DV X F
Chandra Clark
No ratings yet
Bayesian Estimation of The Concentration Parameter of The L-Mode Von Mises Distribution
Document3 pages
Bayesian Estimation of The Concentration Parameter of The L-Mode Von Mises Distribution
International Journal of Innovative Science and Research Technology
No ratings yet
Optimization of Logistics I 2 Typical Problems r20 PDF
Document60 pages
Optimization of Logistics I 2 Typical Problems r20 PDF
Deisy Ñontol Muñoz
No ratings yet
Dynamic Programming: Logical Re-Use of Computations
Document87 pages
Dynamic Programming: Logical Re-Use of Computations
Supratim Ray
No ratings yet
Part A: Texas A&M University MEEN 683 Multidisciplinary System Design Optimization (MSADO) Spring 2021 Assignment 2
Document5 pages
Part A: Texas A&M University MEEN 683 Multidisciplinary System Design Optimization (MSADO) Spring 2021 Assignment 2
Boba
No ratings yet
MMW Lesson 12
Document13 pages
MMW Lesson 12
yzarei mangunay
No ratings yet
Linear Programming PDF
Document39 pages
Linear Programming PDF
geeta
No ratings yet
Module 1 - Week 1 - Quarter 1 - Gen Math
Document12 pages
Module 1 - Week 1 - Quarter 1 - Gen Math
Karu Grey
No ratings yet
Eigenfaces Vs Fisher Faces Presentation
Document28 pages
Eigenfaces Vs Fisher Faces Presentation
Marian Moise
No ratings yet
Mas de Regresion Multivariada
Document10 pages
Mas de Regresion Multivariada
nerinconq
No ratings yet
An Introduction To Support Vector Machines: Biplab Banerjee
Document31 pages
An Introduction To Support Vector Machines: Biplab Banerjee
Akshat sharma
No ratings yet
ML Cheatsheet
Document233 pages
ML Cheatsheet
scribered
100% (1)
Detailed Guide 7 Loss Functions Machine Learning Python Code
Document16 pages
Detailed Guide 7 Loss Functions Machine Learning Python Code
faisal
No ratings yet
Exercises On Backpropagation
Document4 pages
Exercises On Backpropagation
yuyang zhang
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
Document7 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
juanagallardo01
No ratings yet
W9a Autoencoders Pca
Document7 pages
W9a Autoencoders Pca
zeliawillscumberg
No ratings yet
Automatic Differentiation: H Avard Berland
Document22 pages
Automatic Differentiation: H Avard Berland
harinaathan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Functional Programming: Exercise Points Points Achieved
Document9 pages
Functional Programming: Exercise Points Points Achieved
Uasdaf
No ratings yet
2018 Midterm Exam
Document11 pages
2018 Midterm Exam
Uasdaf
No ratings yet
2016 Midterm Exam
Document6 pages
2016 Midterm Exam
Uasdaf
No ratings yet
Stochastic Calculus Midterm Exam Solutions
Document6 pages
Stochastic Calculus Midterm Exam Solutions
Uasdaf
No ratings yet
Stochastic Calculus Midterm Exam: Prof. D. Filipovi C, E. Hapnes 29.10.2019
Document2 pages
Stochastic Calculus Midterm Exam: Prof. D. Filipovi C, E. Hapnes 29.10.2019
Uasdaf
No ratings yet
Stochastic Calculus Midterm Exam
Document2 pages
Stochastic Calculus Midterm Exam
Uasdaf
No ratings yet
Midterm 2019
Document2 pages
Midterm 2019
Uasdaf
No ratings yet
Machine Learning: Martin Jaggi & Nicolas Flammarion
Document52 pages
Machine Learning: Martin Jaggi & Nicolas Flammarion
Uasdaf
No ratings yet
Regression: Machine Learning Course - CS-433
Document5 pages
Regression: Machine Learning Course - CS-433
Uasdaf
No ratings yet
Regression: Machine Learning Course - CS-433
Document5 pages
Regression: Machine Learning Course - CS-433
Uasdaf
No ratings yet
Lecture 4
Document50 pages
Lecture 4
Rakshith Kamath
No ratings yet
DR - 2019 - Bottleneck RGB Features For Tea Clones
Document4 pages
DR - 2019 - Bottleneck RGB Features For Tea Clones
Dadan Rohdiana
No ratings yet
Accept-Reject Sampling Method Derivation and Computation in R
Document3 pages
Accept-Reject Sampling Method Derivation and Computation in R
Stephen Downing
No ratings yet
Machine Learning: Notes by Aniket Sahoo - Part II
Document140 pages
Machine Learning: Notes by Aniket Sahoo - Part II
Anugrah Stanley
No ratings yet
Artificial Intelligence On: Robotics
Document27 pages
Artificial Intelligence On: Robotics
shravz91919
No ratings yet
ICA-ANN, ANN and Multiple Regression Models For Prediction of Surface
Document13 pages
ICA-ANN, ANN and Multiple Regression Models For Prediction of Surface
Haha Zaza
No ratings yet
Ec6511 DSP Lab Manual
Document156 pages
Ec6511 DSP Lab Manual
ramadossapece
100% (5)
Economic 10
Document21 pages
Economic 10
ahmed elbatawy
No ratings yet
S & S 2 Marks
Document30 pages
S & S 2 Marks
priyajega
No ratings yet
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
Document11 pages
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
cintia3141516
No ratings yet
The Finite Element Method For The Analysis of Linear Systems y y
Document28 pages
The Finite Element Method For The Analysis of Linear Systems y y
DDelGGul
No ratings yet
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
Document2 pages
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
Ben Babu Pallikkunnel
No ratings yet
Assignment 1-1
Document3 pages
Assignment 1-1
muhaba Adege
No ratings yet
Review of Related Literature
Document2 pages
Review of Related Literature
Naomi Ashley Lat
No ratings yet
224 Block-3
Document129 pages
224 Block-3
shivam saxena
No ratings yet
116 Integrals Using Reduction Formulas
Document6 pages
116 Integrals Using Reduction Formulas
Esa Khan
No ratings yet
Mat102 Scheme-2022
Document4 pages
Mat102 Scheme-2022
ashwinharikumar56
No ratings yet
Nyquist
Document18 pages
Nyquist
Mahi Ya
No ratings yet
0 10, Trees : Physics, The University Hew York, New York
Document24 pages
0 10, Trees : Physics, The University Hew York, New York
Prashant Gokaraju
No ratings yet
C++ Challenge
Document6 pages
C++ Challenge
AliceAlormenu
No ratings yet
Final DC QB
Document9 pages
Final DC QB
umiqbal
No ratings yet
2102 Final Exam PASS Session
Document37 pages
2102 Final Exam PASS Session
Bob
No ratings yet
L23 - Postulates of QM
Document24 pages
L23 - Postulates of QM
domagix470
No ratings yet
Embedding in Medical Images: An Efficient Scheme For Authentication and Tamper Localization
Document30 pages
Embedding in Medical Images: An Efficient Scheme For Authentication and Tamper Localization
ناصرھرە
No ratings yet
Camm 3e Ch07 PPT PDF
Document116 pages
Camm 3e Ch07 PPT PDF
Rhigene Solana
No ratings yet
Dspa Question Bank
Document2 pages
Dspa Question Bank
Panku Rangaree
No ratings yet
Chap 2 - Probability Theory - PPT
Document27 pages
Chap 2 - Probability Theory - PPT
Imad
No ratings yet
Endoreversible Thermodynamics: Karl Heinz Homann, Josef Maximilian Burzler, and Sven Schubert
Document42 pages
Endoreversible Thermodynamics: Karl Heinz Homann, Josef Maximilian Burzler, and Sven Schubert
aldo
No ratings yet
Jorn Sass - Optimal Portfolios Under Bounded Shortfall
Document6 pages
Jorn Sass - Optimal Portfolios Under Bounded Shortfall
David
No ratings yet