aftaesdg

© All Rights Reserved

0 views

aftaesdg

© All Rights Reserved

- Face Recognition Using Artificial Neural Networks
- neural networks
- Using Artificial Neural Networks to Predict the Fatigue
- Back-Propagation 1
- Notes on Backpropagation
- Decision Support System Using Decision Tree and Neural Networks
- Artificial Neural Network _ann_ Prediction of One-dimensional Consolidation
- Modelling by FEA
- 0143_Pitel
- CSSS2010-20100803-Kanevski-lecture2
- Neural Networks Simon haykin book review
- Neural Networks
- A Neural Networks Approach for Deriving Irrigation Reservoir Operating Rules
- KBS_RG2013-14
- Wang 2016
- Backprop as Functor- A Compositional Perspective on Supervised Learning
- ajassp.2012.231.237
- Super-Resolution Technique for MRI Images Using Artificial Neural Network
- ann_GIS.pdf
- ppm_en_2004_02_Aggarwal.pdf

You are on page 1of 25

Data Scientist

Apr 20, 2017 · 9 min read

made it Complicated ?

Learning Outcome: You will be able to build your own

Neural Network on a Paper.

network, I scratched my head for a long time on how Back-Propagation

works. When I talk to peers around my circle, I see a lot of people

facing this problem. Most people consider it as a black-box and use

libraries like Keras, TensorFlow and PyTorch which provide automatic

differentiation. Though it is not necessary to write your own code on

how to compute gradients and backprop errors, having knowledge on

it helps you in understanding a few concepts like Vanishing Gradients,

Saturation of Neurons and reasons for random initialization of weights

Hi, I am Prakash. Sign up for my

newsletter on Deep Learning.

yourname@example.com

Email Address Sign up

Andrej Karapathy wrote a blog-post on it and I found it useful.

backprop

When we offered CS231n (Deep

Learning class) at Stanford, we

intentionally designed the

programming assignments to…

medium.com

Approach

• Build a small neural network as defined in the architecture below.

Architecture:

• Build a Feed Forward neural network with 2 hidden layers. All the

layers will have 3 Neurons each.

• 1st and 2nd hidden layer will have Relu and sigmoid respectively

as activation functions. Final layer will have Softmax.

I have taken inputs, weights and bias randomly

Initializing the network

Layer-1

Neural Network Layer-1

Matrix Operation:

Relu operation:

Example:

Layer-1 Example

Layer-2

Matrix operation:

Sigmoid operation:

Sigmoid Operation

Example:

Layer-2 Example

Layer-3

Matrix operation:

Softmax operation:

Softmax formula

Example:

softmax would be [0.19858, 0.28559, 0.51583] instead of

[0.26980, 0.32235, 0.40784]. I have done a/sum(a) while the

correct answer would be exp(a)/sum(exp(a)) . Please adjust your

calculations from here on using these values. Thank you.

Analysis:

• The Actual Output should be [1.0, 0.0, 0.0] but we got [0.2698,

0.3223, 0.4078].

Error:

Cross-Entropy:

Cross-Entropy Formula

Example:

Cross-Entropy calculation

We are done with forward pass. Now let us see backward pass

Important Derivatives:

Sigmoid:

Derivative of Sigmoid

Relu:

Derivative of Relu

Softmax:

Derivative of Softmax

Layer2—Output Layer) Weights

Backpropagating Layer-3 weights

we can reuse them whenever necessary. Here are we are using only

one example (batch_size=1), if there are more examples, We just need

to average everything.

In our example,

Next let us calculate the derivative of each output with respect to their

input.

In our example,

For each input to neuron let us calculate the derivative with respect to

each weight. Now let us look at the final derivative

values of derivative of input to output layer wrt weights.

By symmetry:

Matrix Form of all derivatives in layer-3

All of the above values are calculated before. We just need to substitute

the results.

So, We have calculated new weight matrix for W_{kl}. Now let us

move to the next layer:

Layer1—Hidden Layer 2) Weights

Backpropagating errors to 2nd layer

the error derivatives wrt weights in this layer.

Example: Derivative of sigmoid output wrt layer 2 input

In our example:

For each input to neuron let us calculate the derivative with respect to

each weight. Now let us look at the final derivative

Values of derivative layer 2 input wrt of weight

weight from j3 to k1

By symmetry we get the final matrix as,

We have already calculated the 2nd and 3rd term in each matrix. We

need to check on the 1st term. If we see the matrix, the first term is

common in all the columns. So there are only three values. Let us look

into one value

Breakdown of error.

Again the first two values are already calculated by us when dealing

with derivatives of W_{kl}. We just need to calculate the third one,

Which is the derivative of input to each output layer wrt output of

hidden layer-2. It is nothing but the corresponding weight which

connects both the layers.

corresponding values for our example.

Our final matrix of derivatives of Weights connecting hidden layer-1 and hidden layer-2

In our example,

Calculations from our examples

Consider a learning rate (lr) of 0.01 We get our final Weight matrix as

So, We have calculated new weight matrix for W_{jk}. Now let us

move to the next layer:

—Hidden Layer 1) Weights.

Edit:1 the following calculations from here are wrong. I took only

wj1k1 and ignored wj1k2 and wj1k3. This was pointed by an user in

comments. I would like someone to edit the jupyter notebook attached

at the end. Please refer to some other implementations if u still didn’t

understand back-prop here.

Backpropagating errors to 1st layer

the error derivatives wrt weights in this layer.

beginning of the post). Since all inputs are positive, We will get output

as 1

Calculations

For each input to neuron let us calculate the derivative with respect to

each weight. Now let us look at the final derivative.

Now we will calculate the change in

By symmetry,

We know the 2nd and 3rd derivatives in each cell in the above matrix.

Let us look at how to get to derivative of 1st term in each cell.

Watching for the first term

We have calculated all the values previously except the last one in each

cell, which is a simple derivative of linear terms.

In our example,

Final matrix

Consider a learning rate (lr) of 0.01 We get our final weight matrix as

using learning rate we get final matrix

Important Notes:

• I have completely eliminated bias when differentiating. Do you

know why ?

• I have taken only one example. What will happen if we take batch

of examples?

Do you see why it occurs?

• What would happen if all the weights are the same number

instead of random ?

References Used:

• http://csrgxtu.github.io/2015/03/20/Writing-Mathematic-

Fomulars-in-Markdown/

• https://mattmazur.com/2015/03/17/a-step-by-step-

backpropagation-example/

• http://eli.thegreenplace.net/2016/the-softmax-function-and-its-

derivative/

• https://github.com/Prakashvanapalli/TensorFlow/blob/master/

Blogposts/Backpropogation_with_Images.ipynb

like it, please recommend and share it. Thank you.

- Face Recognition Using Artificial Neural NetworksUploaded byNatheem Safin
- neural networksUploaded bySerag El-Deen
- Using Artificial Neural Networks to Predict the FatigueUploaded byvaalgatamilram
- Back-Propagation 1Uploaded byRoots999
- Notes on BackpropagationUploaded byJun Wang
- Decision Support System Using Decision Tree and Neural NetworksUploaded byAlexander Decker
- Artificial Neural Network _ann_ Prediction of One-dimensional ConsolidationUploaded byIAEME Publication
- Modelling by FEAUploaded byBiranchi Panda
- 0143_PitelUploaded byVidya Meenakshi
- Neural Networks Simon haykin book reviewUploaded byRabia Jiya
- Neural NetworksUploaded bymundosss
- Wang 2016Uploaded byvbriceno1
- CSSS2010-20100803-Kanevski-lecture2Uploaded byMuhd Adha
- A Neural Networks Approach for Deriving Irrigation Reservoir Operating RulesUploaded byncfengenharia
- Backprop as Functor- A Compositional Perspective on Supervised LearningUploaded byJan Hula
- Super-Resolution Technique for MRI Images Using Artificial Neural NetworkUploaded byEditor IJRITCC
- ajassp.2012.231.237Uploaded byDinesh Perumalsamy
- KBS_RG2013-14Uploaded byramgopaljha
- ann_GIS.pdfUploaded byNithila Barathi Nallasamy
- ppm_en_2004_02_Aggarwal.pdfUploaded byHarisanto As
- artificial intelligenceUploaded bybt2014
- 2-150507064358-lva1-app6891Uploaded byShrunkhala Wankhede
- ejsr_22_1_08RajarUploaded byAymanaymanaymanayman
- NeuralUploaded bysramiz_1987
- 2005 - A Method of 3D Face Recognition Based on Principal Component Analysis Algorithm - In ROIUploaded byluongxuandan
- The Study of Optical Character RecgonitionUploaded bymike110*
- Uporedba Neural i BM3DUploaded byEnaNuković
- 12 Michael MelcherUploaded byVenkata Suryanarayana Gorle
- _7e33691689dcfc29ff4e719623a22d95_ResumeShortenedUploaded bysefdeni
- An Efficient AI Model for Financial Market Prediction Optimized by SVRUploaded byIJRASETPublications

- sdffdsfUploaded bylubeck abraham huaman ponce
- Torrent Downloaded From Demonoid.meUploaded bySubhrashis Guha Niyogi
- bfsbdbdfbfdb.pdfUploaded bylubeck abraham huaman ponce
- sun2016.pdfUploaded bylubeck abraham huaman ponce
- VTTVVTGUploaded bylubeck abraham huaman ponce
- winkkeselUploaded byAmelia Dumitru
- kim2018.pdfUploaded bylubeck abraham huaman ponce
- Matsuura 2016Uploaded bylubeck abraham huaman ponce
- untitled.pdfUploaded bylubeck abraham huaman ponce
- Zhi Peng Zhang 2010Uploaded bylubeck abraham huaman ponce
- Harry 2018Uploaded bylubeck abraham huaman ponce
- Xavier 2017Uploaded bylubeck abraham huaman ponce
- Noh 2017Uploaded bylubeck abraham huaman ponce
- readme.txtUploaded bylubeck abraham huaman ponce
- backprop.pdfUploaded bylubeck abraham huaman ponce
- Magnetic Chevere Cuatro p[OlosUploaded bylubeck abraham huaman ponce
- datasetUploaded bylubeck abraham huaman ponce
- Design and Construction of a Multispectral Camera for Spectral an.pdfUploaded bylubeck abraham huaman ponce
- CMakeLists.txtUploaded bylubeck abraham huaman ponce
- c Make ListsUploaded bylubeck abraham huaman ponce
- bpUploaded bylubeck abraham huaman ponce
- Daemon ControlUploaded bylubeck abraham huaman ponce
- Coupling Circuit Design for PLCUploaded byzafarbajwa
- kivy_guideUploaded bySUSHANT AGARWAL
- 11773-26759-1-SMUploaded bylubeck abraham huaman ponce
- V1N2-002Uploaded bylubeck abraham huaman ponce

- strain gauge installationUploaded byAnonymous UoHUag
- KA1M0565RUploaded byricardinhobhmg
- HPLC Troubleshooting GuideUploaded byNaresh Babu
- Avaliações 2009-1 - GabaritoUploaded byEduardobello
- Fft Basics 2008Uploaded bydrpeffers
- 14.pdfUploaded byabdul
- BAKKER, S. J (2009), The Noun Phrase in Ancient GreekUploaded byIstván Drimál
- Magnetism FinalUploaded byNeelam Kapoor
- Lab 8- Light and Color (1) PHETUploaded bynicwelch
- Citadel Operator GuideUploaded byPatricia Mendoza
- Energy Modernisation of Industrial HeatingUploaded bythermosol5416
- 256412Uploaded bytabibkarim
- A Stochastic Model, PDFUploaded bySoumitra Bhowmick
- chp 5 Gradient of a Straight LineUploaded byNiceman Natiqi
- Schneider Electric PM710MG DatasheetUploaded byWahyu Setya
- 5501 Temp. GaugeUploaded byhnvrajrah
- Fractional calculus operators involving generalized Mittag-Leffler functionUploaded byTI Journals Publishing
- Barium Silicates as High Thermal Expansion Seals for Solid Oxide Fuel Cells Studied by High Temperature X - Ray Diffraction (HT-XRD)Uploaded byAdammplouh
- ee0301_electrical_and_electronics_measurements[1].pdfUploaded byakash mishra
- determination of r - lab report exampleUploaded byapi-239855791
- Design and Analysis of Ball BearingUploaded byGamini Suresh
- [11] Shear Stress Tolerance and Biochemical Characterization OfUploaded byDaniel LAndinez
- Tubus Catalog R 12 2010Uploaded byanhntran4850
- MDB Lecture Simple StrainUploaded byAura Paige Collado Montecastro
- An Introduction to Higher MathematicsUploaded byOscar Tobar
- 7_Scientific_Answers_to_Common_Coaching_Questions.pdfUploaded byMatteo JDog Spinola
- Final SotAR TU1207Uploaded bymikoriz
- MSE8-2Uploaded byspamfilter02
- balancing of rotorsUploaded byChetan Mistry
- Aisc Shapes Database v15.0Uploaded byErickSimon