Welcome to Scribd!

Skip carousel

Backprop and Optimizers

Uploaded by

Abdul hadi

0% found this document useful (0 votes)

6 views62 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views62 pages

Backprop and Optimizers

Uploaded by

Abdul hadi

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 62

Search inside document

Backpropagation

1
Neural Network
Hidden Layer 1 Hidden Layer 2 Hidden Layer 3
Input Layer

Output Layer
Width

Depth
6
Computational Graphs
• Neural network is a computational graph

– It has compute nodes

– It has edges that connect nodes

– It is directional

– It is organized in ‘layers’

9
Backprop
The Importance of Gradients
• Our optimization schemes are based on computing
gradients

• One can compute gradients analytically but what if

our function is too complex?

• Break down gradient computation Backpropagation

Done by many people before, but somehow (mostly) credited to Rumelhart 1986
11
Backprop: Forward Pass
•

sum

mult

12
Backprop: Backward Pass

sum

mult

13
Backprop: Backward Pass

sum

mult

14
Backprop: Backward Pass

sum

mult

15
Backprop: Backward Pass

sum

mult

16
Backprop: Backward Pass

sum

mult

Chain Rule:

17
Backprop: Backward Pass

sum

mult

Chain Rule:

Upstream Gradient
Downstream Gradient Local Gradient
18
Compute Graphs -> Neural Networks
•

19
Compute Graphs -> Neural Networks
Input layer Output layer

Loss
/
cost

Input Weights L2 Loss

function
(unknowns!)

e.g., class label/

regression target
Compute Graphs -> Neural Networks
Input layer Output layer

Loss
/
cost

Input Weights L2 Loss

(unknowns!) function

e.g., class label/

regression target
21
Compute Graphs -> Neural Networks
Input layer
… Output layer

⟶ use chain rule to compute partials

Activation bias
function

23
Gradient Descent for Neural Networks

Input Layer Hidden Layer 1 Hidden Layer 2 Hidden Layer 3

Output Layer

Gradient
step:

35
NNs can Become Quite Complex…
• These graphs can be huge!

[Szegedy et al.,CVPR’15] Going Deeper with Convolutions

36
The Flow of the Gradients
• Many many many many of these nodes form a
neural network

NEURONS

• Each one has its own work to do

FORWARD AND BACKWARD PASS

37
Gradient Descent for Neural Networks

39
Gradient Descent for Neural Networks
Backpropagation

Just go through layer by layer

40
Gradient Descent for Neural Networks

Note that some activations

have also weights

41
Derivatives of Cross Entropy Loss
Gradients of weights of last layer:

Binary Cross Entropy loss

output scores
I 42
Derivatives of Cross Entropy Loss
Gradients of weights of first layer:

43
Back to Compute Graphs & NNs
•

44
Gradient Descent for Neural Networks

45
Gradient Descent for Neural Networks

46
Gradient Descent for Neural Networks

47
Gradient Descent for Neural Networks

48
Gradient Descent for Neural Networks

49
Gradient Descent for Neural Networks

50
Gradient Descent for Neural Networks

51
Gradient Descent for Neural Networks

52
Gradient Descent for Neural Networks

53
Gradient Descent
• How to pick good learning rate?

• How to compute gradient for single training pair?

• How to compute gradient for large training set?

• How to speed things up? More to see in next

lectures…

54
sdasdas
AdaGrad

• AdaGrad’s concept is to modify the learning rate for every parameter in a model
depending on the parameter’s previous gradients.
• Calculates the learning rate as the sum of the squares of the gradients over time,
one for each parameter

• This reduces the learning rate for parameters with big partial derivative of
loss,while raising/small decrease in the learning rate for parameters with modest
gradients.
• The net effect is greater progress in the more gently sloped directions of
parameter space
• what would happen to the sum of squared gradients if the training takes too
long.

• Over time, this term would grow larger. When the current gradient is divided
by this large number, the update step for the weights becomes very small.

• It is as if we were using a very low learning rate, which becomes even lower
the longer the training takes.

• AdaGrad performs well for some but not all deep learning models
RMSProp: "Leaky Adagrad"
• instead of allowing this sum to increase continuously over the
training period, we allow the sum to decrease by including the term of
Decay Rate.

grads_ quard e = 0
fort in ra ng e n( um_steps):
dw = co mute radi ne t( ) w
gradsquard e = decy a rate * gradsquard e + (1 - decay rat)
e * dw * dw
RMSProp
w -= le r
a ning r_ t
a e * dw / g( rads_ quard
e s. qrt() + le-7)
ADAM
• we have used the momentum term to determine the velocity of the gradient
and update the weight parameter in the direction of that velocity.
• the sum of squared gradients to scale the current gradient so that we could
update the weights in each space dimension with the same ratio.

4-Recurrent Neural Network
Document21 pages
4-Recurrent Neural Network
KaAI Kookmin
No ratings yet
Practical Neural Network Recipies in C++
From Everand
Practical Neural Network Recipies in C++
Masters
Rating: 3.5 out of 5 stars
3.5/5 (5)
Deep Learning Tutorial Complete (v3)
Document109 pages
Deep Learning Tutorial Complete (v3)
Mario Cordina
No ratings yet
Leture 01
Document105 pages
Leture 01
justmejosh
No ratings yet
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
Document50 pages
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
Mohammad Al Samhouri
No ratings yet
Lecture 14
Document50 pages
Lecture 14
Atınç Yılmaz
No ratings yet
Deep Learning Computer Vision NLP
Document140 pages
Deep Learning Computer Vision NLP
tung vu son
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
Document106 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
Zee Ingame
No ratings yet
AIML-Module-3-part 2
Document122 pages
AIML-Module-3-part 2
srujanmoily
No ratings yet
Convolutional Neural Networks (1) : Geena Kim
Document28 pages
Convolutional Neural Networks (1) : Geena Kim
Huston LAM
No ratings yet
Ann 2
Document34 pages
Ann 2
Harsh Mohan Sahay
No ratings yet
CS 4650/7650: Natural Language Processing: Neural Text Classification
Document85 pages
CS 4650/7650: Natural Language Processing: Neural Text Classification
Rahul Gautam
No ratings yet
AML 03 Dense Neural Networks
Document20 pages
AML 03 Dense Neural Networks
Vaibhav
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
Document50 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
Zee Ingame
No ratings yet
Neural - Networks
Document47 pages
Neural - Networks
howgibaa
No ratings yet
Non-Linear Classifiers
Document19 pages
Non-Linear Classifiers
Pooja Patwari
No ratings yet
QNN v5
Document17 pages
QNN v5
ychu3849
No ratings yet
l15 16 Autoencoders
Document26 pages
l15 16 Autoencoders
Rajakumar Awaradi
No ratings yet
Tiny
Document37 pages
Tiny
yerly X
No ratings yet
3 - DeepLearning - and - CNN v3
Document50 pages
3 - DeepLearning - and - CNN v3
Dumidu Ghanasekara
No ratings yet
3 ArtificialNeuralNetworks PDF
Document77 pages
3 ArtificialNeuralNetworks PDF
Rahul Pari
No ratings yet
Neural Networks
Document68 pages
Neural Networks
Haiping Lu
No ratings yet
Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon
Document36 pages
Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon
夏天
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
Document56 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
18MDS019
No ratings yet
Dlincv 161110052148 PDF
Document271 pages
Dlincv 161110052148 PDF
Raj Verma
No ratings yet
L09-10 DL and CNN
Document56 pages
L09-10 DL and CNN
Paulo Santos
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
Document56 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
18MDS019
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
Document22 pages
Foundations of Machine Learning: Module 6: Neural Network
Nishant Tiwari
No ratings yet
Forecasting Mobile Network (Internet) Traffic With Recurrent Neural Network (RNN)
Document54 pages
Forecasting Mobile Network (Internet) Traffic With Recurrent Neural Network (RNN)
Muhammad Ikhsan
No ratings yet
Ann 2
Document22 pages
Ann 2
Jeril Joy Joseph
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
Document54 pages
Convolutional Neuralnetworks: Abin - Roozgard
Arnold Schwarzenegger
No ratings yet
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
Document38 pages
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
Ramesh Mk
No ratings yet
GAS Presentation
Document35 pages
GAS Presentation
Lihui Tan
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
Document41 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
MOHD ASIF ALI
No ratings yet
Image Edge Detection Based On Fpga: Presented By: M.Ashok
Document38 pages
Image Edge Detection Based On Fpga: Presented By: M.Ashok
Pavan Kumar
No ratings yet
L10 - Intro - To - Deep - Learning
Document75 pages
L10 - Intro - To - Deep - Learning
NALE SVPMEngg
No ratings yet
Lec 6
Document154 pages
Lec 6
cheint
No ratings yet
Lec 5
Document175 pages
Lec 5
cheint
No ratings yet
CENG3300 Lecture 9
Document19 pages
CENG3300 Lecture 9
huichloemail
No ratings yet
Deep Learning Handson
Document65 pages
Deep Learning Handson
Alan
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
Document31 pages
Deep Learning Basics Lecture 1 Feedforward
baris
No ratings yet
ANN Intro
Document18 pages
ANN Intro
Anurag Raut
No ratings yet
Introduction To Data Science and Machine Learning
Document21 pages
Introduction To Data Science and Machine Learning
gb_oprescu
No ratings yet
Pytorch Slides
Document31 pages
Pytorch Slides
Shashank
No ratings yet
PDF Document
Document17 pages
PDF Document
Osama Muhammad
No ratings yet
Lec9 CNN 25jan18
Document111 pages
Lec9 CNN 25jan18
Trần Văn Duy
No ratings yet
2.neural Network
Document19 pages
2.neural Network
rajthakre81
No ratings yet
Recurrent Neural Networks: Prof. Harshawardhan P. Ahire
Document56 pages
Recurrent Neural Networks: Prof. Harshawardhan P. Ahire
PRATIK GANGAPURWALA
No ratings yet
Neural Networks
Document40 pages
Neural Networks
salemamr1010
No ratings yet
Paper Survey - Training With Quantization Noise For Extreme Model Compression
Document25 pages
Paper Survey - Training With Quantization Noise For Extreme Model Compression
thisisveryunsafe
No ratings yet
Face Recognition Using Facenet
Document46 pages
Face Recognition Using Facenet
vasavi college
No ratings yet
Talk MLSS Part2
Document97 pages
Talk MLSS Part2
Neetha Mary
No ratings yet
Guest-Lecture - NN Architectures
Document64 pages
Guest-Lecture - NN Architectures
sk1029
No ratings yet
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
Document57 pages
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
jijo123408
No ratings yet
Neuralnetworks 1
Document65 pages
Neuralnetworks 1
rdsraj
No ratings yet
Back Propagation
Document20 pages
Back Propagation
Shame Bope
No ratings yet
Ann PDF
Document129 pages
Ann PDF
Sohan Reddy
No ratings yet
Alexnet Tugce Kyunghee
Document35 pages
Alexnet Tugce Kyunghee
Jose
No ratings yet
Introduction To Compressive Sensing: Department of Electrical Engineering University of Isfahan
Document50 pages
Introduction To Compressive Sensing: Department of Electrical Engineering University of Isfahan
Madhusudhana Rao
No ratings yet
Artificial Neural Network
Document16 pages
Artificial Neural Network
mk
100% (1)
Lecture 5
Document41 pages
Lecture 5
Alfredo EsquivelJ
No ratings yet
EE4213-Robotics and Mechatronics-Lecture AI 8
Document24 pages
EE4213-Robotics and Mechatronics-Lecture AI 8
Chanuka Wickramasinghe
No ratings yet
Java 2 Micro Edition: Professional Developer's Guide
From Everand
Java 2 Micro Edition: Professional Developer's Guide
Eric Gigu?re
Rating: 4 out of 5 stars
4/5 (1)
ResNet Overview
Document17 pages
ResNet Overview
TKNgu
No ratings yet
ML CT Question Paper 2023 24
Document2 pages
ML CT Question Paper 2023 24
jaglanhemantkumar777
No ratings yet
19 - Introduction To Neural Networks
Document7 pages
19 - Introduction To Neural Networks
Rugal
No ratings yet
DL Unit - 4
Document14 pages
DL Unit - 4
Kalpana M
No ratings yet
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
Document7 pages
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
Tariku Kussia
No ratings yet
4 MCQ Ann Ann Quiz Selected
Document18 pages
4 MCQ Ann Ann Quiz Selected
Đồng Dương
No ratings yet
Case Studies Why Look at Case Studies?: Deeplearning - Ai
Document50 pages
Case Studies Why Look at Case Studies?: Deeplearning - Ai
ThànhĐạt Ngô
No ratings yet
Super VIP Cheatsheet - Deep Learning
Document47 pages
Super VIP Cheatsheet - Deep Learning
VinamraMishra
No ratings yet
06 RNN Et Séquences Temporelles v2.02
Document37 pages
06 RNN Et Séquences Temporelles v2.02
youssefELMOUTEE
No ratings yet
Deep Learning
Document189 pages
Deep Learning
Raja
No ratings yet
NNDL
Document96 pages
NNDL
Yogesh Krishna
No ratings yet
DCS 304
Document8 pages
DCS 304
ranjan
No ratings yet
Convolutional Neural Networks: CMSC 35246: Deep Learning
Document166 pages
Convolutional Neural Networks: CMSC 35246: Deep Learning
Diego Antonio
No ratings yet
Review Article: Deep Learning For Computer Vision: A Brief Review
Document14 pages
Review Article: Deep Learning For Computer Vision: A Brief Review
Uma Tamil
No ratings yet
ANN Models
Document42 pages
ANN Models
Aakansh Shrivastava
No ratings yet
CS 563-DeepLearning-SentimentApplication-April2022 (27403)
Document124 pages
CS 563-DeepLearning-SentimentApplication-April2022 (27403)
Varaprasad D
No ratings yet
Prediksi Indeks BEI Dengan Ensemble CNN
Document11 pages
Prediksi Indeks BEI Dengan Ensemble CNN
failamir abdullah
No ratings yet
Lesson 6: Practical Deep Learning For Coders (V2)
Document21 pages
Lesson 6: Practical Deep Learning For Coders (V2)
John Curtis
No ratings yet
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
Document8 pages
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
Abhishek Sanap
No ratings yet
ANN - Session 3 CO 1
Document8 pages
ANN - Session 3 CO 1
joxekoj
No ratings yet
Mathematical Psychology
Document8 pages
Mathematical Psychology
BOBBY212
No ratings yet
Industrial Computing Artificial Neural Network Eng. Byron Lima MSC
Document13 pages
Industrial Computing Artificial Neural Network Eng. Byron Lima MSC
Randy romero
100% (1)
VGG16 Architecture
Document30 pages
VGG16 Architecture
Mehak Smagh
No ratings yet
07 Neural Networks1
Document73 pages
07 Neural Networks1
Rhiksa D'vhieyyrho
No ratings yet
MST-2 - Machine Learning
Document14 pages
MST-2 - Machine Learning
Guddu
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
Document55 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
Sanjeeb
No ratings yet
Classification of Garments From Fashion MNIST
Document7 pages
Classification of Garments From Fashion MNIST
Muhammad Shoib Amin
No ratings yet