Chapter 3 - Supervised Learning - Neural Network Final

Chapter 3: Supervised
Learning: Neural Network

Contents from lesson plan:
• Introduction to perceptron learning, Model representation
• Gradient checking, Back propagation algorithm
• Multi-class classification, and Application- classifying digits
• Support vector machines
Nonlinear systems
Artificial neural network - representation of a neuron
•In an artificial neural network, a neuron is a logistic unit
• Feed input via input wires
• Logistic unit does computation
• Sends output down output wires
•That logistic computation is just like our previous logistic regression hypothesis calculation
•Very simple model of a neuron's computation
• Often good to include an x0 input - the bias unit
• This is equal to 1
•This is an artificial neuron with a sigmoid (logistic) activation function
• Ɵ vector may also be called the weights of a model
•The above diagram is a single neuron
•Here, input is x1, x2 and x3
• We could also call input activation on the first layer - i.e. (a11, a21 and a31 )
• Three neurons in layer 2 (a12, a22 and a32 )

• Final fourth neuron which produces the output
• Which again we could call a13
Neural networks - notation
ai(j) - activation of unit i in layer j
So, a12 - is the activation of the 1st unit in the second layer

By activation, we mean the value which is computed and output by that node
Ɵ(j) - matrix of parameters controlling the function mapping from layer j to layer j + 1
Parameters for controlling mapping from one layer to the next
If network has
sj units in layer j and
sj+1 units in layer j + 1
Then Ɵj will be of dimensions [sj+1 X sj + 1]
Because
sj+1 is equal to the number of units in layer (j + 1)
is equal to the number of units in layer j, plus an additional unit
Looking at the Ɵ matrix
Column length is the number of units in the following layer

Row length is the number of units in the current layer + 1 (because we have to map the bias unit)
So, if we had two layers - 101 and 21 units in each
Then Ɵj would be = [21 x 102]
What are the computations which occur?
We have to calculate the activation for each node
That activation depends on
The input(s) to the node
The parameter associated with that node (from the Ɵ vector associated with that layer)
Below we have an example of a network, with the associated calculations for the four nodes below
Something conceptually important is that
Every input/activation goes to every node in following layer
Which means each "layer transition" uses a matrix of parameters with the following significance
For the sake of consistency with later nomenclature, we're using j,i and l as our variables here
(although later in this section we use j to show the layer we're on)
Ɵj il
• j (first of two subscript numbers)= ranges from 1 to the number of units in layer l+1
• i (second of two subscript numbers) = ranges from 0 to the number of units in layer l
• l is the layer you're moving FROM
This is perhaps more clearly shown in my slightly over the top example below
For example
Ɵ131 = means
1 - we're mapping to node 1 in layer l+1
3 - we're mapping from node 3 in layer l
1 - we're mapping from layer 1
Model representation II
Model representation II
Back propagation
Simple expressions and interpretation of
the gradient
• Consider a simple multiplication function of two numbers f(x,y)=xy. It is a
matter of simple calculus to derive the partial derivative for either input:
• Interpretation: They indicate the rate of change of a function with respect

to that variable surrounding an infinitesimally small region near a particular
point:
• The derivative on each variable tells you the sensitivity of the whole
expression on its value. For example, if x=4,y=−3 then f(x,y)=−12 and the
derivative on
• This tells us that if we were to increase the value of this variable by a tiny
amount, the effect on the whole expression would be to decrease it (due to
the negative sign), and by three times that amount. This can be seen by
rearranging the previous equation to
• Analogously, since ∂f/∂y=4, we expect that increasing the value of y by

some very small amount h would also increase the output of the function
(due to the positive sign), and by 4h.
For more understanding go through below link
https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
https://playground.tensorflow.org/
https://www.youtube.com/watch?v=I2I5ztVfUSE
Sample example trace

Chapter 3 - Supervised Learning - Neural Network Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 - Supervised Learning - Neural Network Final

Uploaded by

Copyright:

Available Formats

Chapter 3: Supervised

Learning: Neural Network

•Here, input is x1, x2 and x3

• We could also call input activation on the first layer - i.e. (a11, a21 and a31 )

• Three neurons in layer 2 (a12, a22 and a32 )

So, a12 - is the activation of the 1st unit in the second layer

Column length is the number of units in the following layer

• Interpretation: They indicate the rate of change of a function with respect

• Analogously, since ∂f/∂y=4, we expect that increasing the value of y by

You might also like