Welcome to Scribd!

Skip carousel

The Hessian Matrix: Sargur Srihari

Uploaded by

Suvij

0% found this document useful (0 votes)

74 views12 pages

Neural network back propogation notes

Original Title

Hessian Matrix

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Neural network back propogation notes

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

74 views12 pages

The Hessian Matrix: Sargur Srihari

Uploaded by

Suvij

Neural network back propogation notes

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 12

Search inside document

Machine Learning Srihari

The Hessian Matrix

Sargur Srihari

1
Machine Learning Srihari

Definitions of Gradient and Hessian

•  First derivative of a scalar function E(w) with respect to a vector
w=[w1,w2]T is a vector called the Gradient of E(w)
⎡ ∂E ⎤
⎢ ⎥
d ⎢ ∂w1 ⎥
∇E(w) = E(w) = ⎢ If there are M elements in the vector
dw
⎢ ∂E ⎥⎥ then Gradient is a M x 1 vector
⎢ ∂w2 ⎥
⎣ ⎦
•  Second derivative of E(w) is a matrix called the Hessian of
E(w)
⎡ ∂2 E ∂2 E ⎤
⎢ ⎥
d 2
⎢ ∂w1
2
∂w1 ∂w2 ⎥
H = ∇∇E(w) = 2
E(w) = ⎢ ⎥
dw ⎢ ∂E
2
∂2 E ⎥ Hessian is a matrix with
⎢ ∂w2 ∂w1 ∂w2 2 ⎥ M2 elements
⎣ ⎦
•  Jacobian is a matrix consisting of first derivatives wrt a vector
Machine Learning Srihari

Computing the Hessian using Backpropagation

•  We have shown how backpropagation can be used to obtain
first derivatives of error function wrt weights in network
•  Backpropagation can also be used to derive second derivatives
∂2 E
∂w jiwlk
•  If all weights and bias parameters are elements wi of single
vector w then the second derivatives form the elements Hij of
Hessian matrix H where i,j ε {1,..W} and W is the total no of
weights and biases
Machine Learning Srihari

Role of Hessian in Neural Computing

1.  Several nonlinear optimization algorithms for neural
networks
•  are based on second order properties of error surface
2.  Basis for fast procedure for retraining with small
change of training data
3.  Identifying least significant weights
•  For network pruning requires inverse of Hessian
4.  Bayesian neural network
•  Central role in Laplace approximation
•  Hessian inverse is used to determine the predictive distribution for a
trained network
4
•  Hessian eigenvalues determine the values of hyperparameters
•  Hessian determinant is used to evaluate the model evidence
Machine Learning Srihari

Evaluating the Hessian Matrix

•  Full Hessian matrix can be difficult to compute in practice
•  quasi-Newton algorithms have been developed that use approximations
to the Hessian
•  Various approximation techniques have been used to evaluate
the Hessian for a neural network
•  calculated exactly using an extension of backpropagation
•  Important consideration is efficiency
•  With W parameters (weights and biases) matrix has dimension W x W
•  Efficient methods have O(W2)

5
Machine Learning Srihari

Methods for evaluating the Hessian Matrix

•  Diagonal Approximation
•  Outer Product Approximation
•  Inverse Hessian
•  Finite Differences
•  Exact Evaluation using Backpropagation
•  Fast multiplication by the Hessian

6
Machine Learning Srihari

Diagonal Approximation

•  In many case inverse of Hessian is needed

•  If Hessian is approximated by a diagonal matrix (i.e., off-
diagonal elements are zero), its inverse is trivially computed
•  Complexity is O(W) rather than O(W2) for full Hessian

7
Machine Learning Srihari

Outer product approximation

•  Neural networks commonly use sum-of-squared
errors function
1 N
E = ∑ (yn − tn )2
2 n=1
•  Can write Hessian matrix in the form
n
H ≈ ∑ bn bn
T

n=1

•  Where b = ∇y = ∇a
n n n
•  Elements can be found in O(W2) steps
8
Machine Learning Srihari

Inverse Hessian

•  Use outer product approximation to obtain

computationally efficient procedure for approximating
inverse of Hessian

9
Machine Learning Srihari

Finite Differences

•  Using backprop, complexity is reduced from O(W3)

to O(W2)

10
Machine Learning Srihari

Exact Evaluation of the Hessian

•  Using an extension of backprop

•  Complexity is O(W2)

11
Machine Learning Srihari

Fast Multiplication by the Hessian

•  Application of the Hessian involve multiplication by

the Hessian
•  The vector vTH has only W elements
•  Instead of computing H as an intermediate step, find
efficient method to compute product

Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Polynomial Curve Fitting: Sargur N. Srihari
Document19 pages
Polynomial Curve Fitting: Sargur N. Srihari
asdfasdffdsa
No ratings yet
ME554 Sheet 3 Final PDF
Document31 pages
ME554 Sheet 3 Final PDF
Marwan Hisham
No ratings yet
Back Propagation in NN
Document30 pages
Back Propagation in NN
Shubham
No ratings yet
Machine Learning RBF Networks: Gaussian Basis Functions and Nadaraya Watson Model
Document19 pages
Machine Learning RBF Networks: Gaussian Basis Functions and Nadaraya Watson Model
asdfasdffdsa
No ratings yet
1.4 Curve Fitting
Document19 pages
1.4 Curve Fitting
Khairul Madan
No ratings yet
Face Detection and Recognition 2019
Document20 pages
Face Detection and Recognition 2019
Eman zakria
No ratings yet
AI - Physics Informed Neural Network by ARNAB HALDER
Document15 pages
AI - Physics Informed Neural Network by ARNAB HALDER
ARNAB HALDER
No ratings yet
Instance Based Learning
Document20 pages
Instance Based Learning
d sandeep
No ratings yet
nc-hessian
Document13 pages
nc-hessian
adityastkhsg
No ratings yet
PYL101 QM Lecture 5
Document11 pages
PYL101 QM Lecture 5
mukesh
No ratings yet
Optimization Problem
Document21 pages
Optimization Problem
Lyes Br
No ratings yet
Mathematics For Machine Learning
Document100 pages
Mathematics For Machine Learning
Umaphani Mortha
No ratings yet
PCA, Eigenfaces, and Face Detection: CSC320: Introduction To Visual Computing Michael Guerzhoy
Document30 pages
PCA, Eigenfaces, and Face Detection: CSC320: Introduction To Visual Computing Michael Guerzhoy
Apricot Blueberry
No ratings yet
Lecture 12-20220911
Document9 pages
Lecture 12-20220911
Kundan karn
No ratings yet
Background/Random Processes
Document33 pages
Background/Random Processes
Sana Saad
No ratings yet
MIT5 73F18 Lec1
Document10 pages
MIT5 73F18 Lec1
Faradila Danis
No ratings yet
Pde Fem
Document39 pages
Pde Fem
Kartik Bali
No ratings yet
HELM Workbook26 Functions of A Complex Variable
Document62 pages
HELM Workbook26 Functions of A Complex Variable
Cristina Rios
No ratings yet
08 HighDimensional PDF
Document88 pages
08 HighDimensional PDF
Matheus Silva
No ratings yet
08 HighDimensional PDF
Document88 pages
08 HighDimensional PDF
Matheus Silva
No ratings yet
MAI Lecture 02 Vectors and Matrices
Document38 pages
MAI Lecture 02 Vectors and Matrices
Yeabsira
No ratings yet
Support Vector Machines Geometry Feature Selection
Document8 pages
Support Vector Machines Geometry Feature Selection
Mark Nam
No ratings yet
Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures
Document22 pages
Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures
Rachna
No ratings yet
Chapter 6
Document5 pages
Chapter 6
Muntazir Mehdi
No ratings yet
Multi-Frame_Super-Resolution_with_No_Explicit_Moti
Document6 pages
Multi-Frame_Super-Resolution_with_No_Explicit_Moti
aaasim93
No ratings yet
L-02 Lines.6 PDF
Document9 pages
L-02 Lines.6 PDF
nooti
No ratings yet
Lecture 2: Background: - Linear Algebra
Document36 pages
Lecture 2: Background: - Linear Algebra
Sana Saad
No ratings yet
Kernel Methods!: Sargur Srihari!
Document29 pages
Kernel Methods!: Sargur Srihari!
asdfasdffdsa
No ratings yet
Quantum Mechanics Lecture 1 Overview
Document7 pages
Quantum Mechanics Lecture 1 Overview
cicin8190
No ratings yet
Phys507 Lect 6 Special Techniques
Document14 pages
Phys507 Lect 6 Special Techniques
noura haq
No ratings yet
1AI.04b - Introduction To Machine Learning - Supervised Learning - DT PDF
Document65 pages
1AI.04b - Introduction To Machine Learning - Supervised Learning - DT PDF
hải trần thị thúy
No ratings yet
Comm Rao
Document120 pages
Comm Rao
johnoftheroad
No ratings yet
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
Document70 pages
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
Wantong Liao
No ratings yet
DSCTP 2022 1 ML Slides
Document351 pages
DSCTP 2022 1 ML Slides
Benjen Stark
No ratings yet
Vector Calculus Notes
Document3 pages
Vector Calculus Notes
Chamil Gomes
No ratings yet
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
Document40 pages
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
ramnbantun
No ratings yet
06 NeuralNetworksII
Document41 pages
06 NeuralNetworksII
Ben
No ratings yet
Simon Chapter 3
Document12 pages
Simon Chapter 3
shreyas sr
No ratings yet
Challenge Problems
Document1 page
Challenge Problems
Josh Man
No ratings yet
AML 04 Backpropagation
Document26 pages
AML 04 Backpropagation
Vaibhav
No ratings yet
W1M3-LA Review Matrices
Document20 pages
W1M3-LA Review Matrices
2480054
No ratings yet
Lecture04-06 CE72.12FEM - Approximations - To - Weak - Form
Document13 pages
Lecture04-06 CE72.12FEM - Approximations - To - Weak - Form
Rahul Kasaudhan
No ratings yet
Kalman Filter: the Basics Explained
Document4 pages
Kalman Filter: the Basics Explained
danielngala
No ratings yet
Mathematics Reminder
Document16 pages
Mathematics Reminder
pepe sanchez
No ratings yet
Algorithm Lec10
Document61 pages
Algorithm Lec10
Habiba Hegazy
No ratings yet
Generation of Quasi-Uniform Distributions and Verification of The Tightness of The Inner Bound
Document18 pages
Generation of Quasi-Uniform Distributions and Verification of The Tightness of The Inner Bound
dauoodsaleem dauoodsaleem
No ratings yet
MA412 Final
Document82 pages
MA412 Final
Ahmad Zen Fira
No ratings yet
Datamining Lecture6
Document41 pages
Datamining Lecture6
Vi Le
No ratings yet
Physics - 05
Document33 pages
Physics - 05
Raiyan Ibrahim
No ratings yet
TLWeek18 Prof Peter Corke
Document27 pages
TLWeek18 Prof Peter Corke
kiiway 01
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
Document25 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
Edmund Zin
No ratings yet
Non Linear Finite Element Method
Document54 pages
Non Linear Finite Element Method
BENALLA
No ratings yet
Basis Functions for Efficient Serendipity Finite Element Methods
Document25 pages
Basis Functions for Efficient Serendipity Finite Element Methods
Veng
No ratings yet
CENGR 3140 - Numerical Differentiation Methods
Document4 pages
CENGR 3140 - Numerical Differentiation Methods
Bry Ramos
No ratings yet
Week 06 Lecture Material
Document46 pages
Week 06 Lecture Material
Meer Hassan
No ratings yet
FEM Introduction.pptx
Document127 pages
FEM Introduction.pptx
rohansuthar.ascent
No ratings yet
Theory of Deep Learning 1652786371
Document118 pages
Theory of Deep Learning 1652786371
Washington Azevedo
No ratings yet
Theory of Deep Learning 1655537386
Document68 pages
Theory of Deep Learning 1655537386
Berk Ayvaz
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
Document31 pages
Deep Learning Basics Lecture 1 Feedforward
baris
No ratings yet
Causes and Detection of Damages - Avinashtaru7129 - Official
Document13 pages
Causes and Detection of Damages - Avinashtaru7129 - Official
pratik sawant
No ratings yet
Quantity Surveying Division Assessment of Professional Competence Final Assessment 2015 Practice Problems
Document21 pages
Quantity Surveying Division Assessment of Professional Competence Final Assessment 2015 Practice Problems
jacky
No ratings yet
Van Dijk - Discourse, Knowledge and Ideology
Document34 pages
Van Dijk - Discourse, Knowledge and Ideology
O Yusuph
No ratings yet
Math Makes Sense Practice and Homework Book Grade 5 Answers
Document8 pages
Math Makes Sense Practice and Homework Book Grade 5 Answers
eh041zef
100% (1)
Chi Square Test of Proportions For Confirmation Testing: Craig G. Hysong May 6, 2014
Document44 pages
Chi Square Test of Proportions For Confirmation Testing: Craig G. Hysong May 6, 2014
MickloSoberan
No ratings yet
Virtue Ethics: Deficiency Excess
Document1 page
Virtue Ethics: Deficiency Excess
christian
No ratings yet
Handbook of Music and Emotion Theory Research Applications 0199230145 9780199230143 Compress
Document990 pages
Handbook of Music and Emotion Theory Research Applications 0199230145 9780199230143 Compress
Smyth1702
No ratings yet
Course Outline: Yllana Bay View College
Document10 pages
Course Outline: Yllana Bay View College
Criseljosa Lacapag
No ratings yet
Mobatec Modeller Intorduction Course Tutorial I
Document10 pages
Mobatec Modeller Intorduction Course Tutorial I
Mohammad Yasser Ramzan
No ratings yet
Ma Education-4th-Engl2115-16-16 PDF
Document39 pages
Ma Education-4th-Engl2115-16-16 PDF
Sameera Zahid
No ratings yet
EARTH SCIENCE 2nd Pre Activities Wk1
Document11 pages
EARTH SCIENCE 2nd Pre Activities Wk1
Rowell Pantila
No ratings yet
Hacking, Ian - Estilos de Razonamiento
Document18 pages
Hacking, Ian - Estilos de Razonamiento
Badtz Maru
No ratings yet
Dani Extreme V3 Manual PDF
Document14 pages
Dani Extreme V3 Manual PDF
HumbertoJaime
No ratings yet
Adams ST - Pierre Post Qualitative Inquiry
Document7 pages
Adams ST - Pierre Post Qualitative Inquiry
JOOST VANMAELE
No ratings yet
Effect of Oxygen On Surface Tension of Liquid Ag-Sn Alloys
Document5 pages
Effect of Oxygen On Surface Tension of Liquid Ag-Sn Alloys
Burak ÖZBAKIR
No ratings yet
Eng Pcdmis 2022.1 CMM Manual
Document453 pages
Eng Pcdmis 2022.1 CMM Manual
Rahul
No ratings yet
Linear Inequalities in Two Variables
Document8 pages
Linear Inequalities in Two Variables
kiahjessie
67% (3)
Positive Mental Attitude
Document83 pages
Positive Mental Attitude
muetanian
No ratings yet
Course Content
Document7 pages
Course Content
Oğuzhan Dalkılıç
No ratings yet
Philippine Enviro Laws
Document6 pages
Philippine Enviro Laws
Dale Calica
No ratings yet
ROBOTICS (MEB-465) : Quiz Questions
Document21 pages
ROBOTICS (MEB-465) : Quiz Questions
Vishal Dhiman
100% (2)
GRIHA EDS Feasibility Checklist
Document7 pages
GRIHA EDS Feasibility Checklist
Nikunj Dwivedi
No ratings yet
Abraham Tilahun GSE 6950 15
Document13 pages
Abraham Tilahun GSE 6950 15
Abraham tilahun
100% (1)
2 - Basic Theories of Gender Psychology
Document20 pages
2 - Basic Theories of Gender Psychology
ergün Ergün
No ratings yet
Bank robber interrupts live news broadcast about his own crime
Document3 pages
Bank robber interrupts live news broadcast about his own crime
Desy Lianti Echi
No ratings yet
Manufacturing Commercial Phosphate Fertilizer
Document2 pages
Manufacturing Commercial Phosphate Fertilizer
Ghulam Mohy Uddin
No ratings yet
TRANS Finals NSTP
Document6 pages
TRANS Finals NSTP
Stephanie
No ratings yet
Spectral and Amplification Characteristics in San Salvador City (El Salvador) For Upper-Crustal and Subduction Earthquakes
Document9 pages
Spectral and Amplification Characteristics in San Salvador City (El Salvador) For Upper-Crustal and Subduction Earthquakes
Sergio Ito Sunley
No ratings yet
Propositions: A. Learning Outcome Content Standard
Document9 pages
Propositions: A. Learning Outcome Content Standard
Marc Joseph Nillas
No ratings yet
MasterRoc MP 368 TIX
Document3 pages
MasterRoc MP 368 TIX
Zimplemente Kimizita Dominguez Campo
No ratings yet