You are on page 1of 14

Foundations of Machine Learning

Module 6: Neural Network


Part B: Multi-layer Neural
Network
Sudeshna Sarkar
IIT Kharagpur
Limitations of Perceptrons
• Perceptrons have a monotinicity property:
If a link has positive weight, activation can only increase as the
corresponding input value increases (irrespective of other
input values)
• Can’t represent functions where input interactions can cancel
one another’s effect (e.g. XOR)
• Can represent only linearly separable functions
A solution: multiple layers
output layer
y y

z2

hidden layer
z1 z2 z1

x2

input layer
x1 x2 x1
Power/Expressiveness of Multilayer
Networks
• Can represent interactions among inputs
• Two layer networks can represent any Boolean
function, and continuous functions (within a
tolerance) as long as the number of hidden units is
sufficient and appropriate activation functions used
• Learning algorithms exist, but weaker guarantees
than perceptron learning algorithms
Multilayer Network

Outputls
Inputs

First Second
Input hidden hidden Output
layer layer layer
Two-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

n1
n n2 yn2
xn
Input Hidden Output
layer layer

Error signals

6
The back-propagation training algorithm
• Step 1: Initialisation
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range
1

v01
v11 1
x1 1 1 w11
v21 w01

1 y1
v22
x2 2 2 w21
v22
Input v02 Output

1
x z y
Backprop
• Initialization
– Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small
range
• Forward computing:
– Apply an input vector x to input units
– Compute activation/output vector z on hidden layer
𝑧𝑗 = 𝜑(σ𝑖 𝑣𝑖𝑗 𝑥𝑖 )
– Compute the output vector y on output layer
𝑦𝑘 = 𝜑(σ𝑗 𝑤𝑗𝑘 𝑧𝑗 )
y is the result of the computation.
Learning for BP Nets
• Update of weights in W (between output and hidden layers):
– delta rule
• Not applicable to updating V (between input and hidden)
– don’t know the target values for hidden units z1, Z2, … ,ZP
• Solution: Propagate errors at output units to hidden units to
drive the update of weights in V (again by delta rule)
(error BACKPROPAGATION learning)
• Error backpropagation can be continued downward if the net
has more than one hidden layer.
• How to compute errors on hidden units?
Derivation
• For one output neuron, the error function is
1
𝐸 = (𝑦 − 𝑦) ො 2
2
• For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛

𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 ෍ 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
The input 𝑛𝑒𝑡𝑗 to a neuron is the weighted sum of outputs 𝑜𝑘
of previous 𝑛 neurons.
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
Derivation
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
𝑛
𝜕𝑛𝑒𝑡𝑗 𝜕
= ෍ 𝑤𝑘𝑗 𝑜𝑘 = 𝑜𝑖
𝜕𝑤𝑖𝑗 𝜕𝑤𝑖𝑗
𝑘=1
𝜕𝑜𝑗 𝜕
= 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 𝑛𝑒𝑡𝑗 1 − 𝜑 𝑛𝑒𝑡𝑗
𝜕𝑛𝑒𝑡𝑗 𝜕𝑛𝑒𝑡𝑗
Consider 𝐸 as as a function of the inputs of all neurons 𝑍 = 𝑧1 , 𝑧2 , …
receiving input from neuron 𝑗,
𝜕𝐸 𝑜𝑗 𝜕𝐸 𝑛𝑒𝑡𝑧1 , 𝑛𝑒𝑡𝑧2 , …
=
𝜕𝑜𝑗 𝜕𝑜𝑗
taking the total derivative with respect to 𝑜𝑗 , a recursive expression for
the derivative is obtained:
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤𝑗𝑧𝑙
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙
𝑙 𝑙
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙 𝑗𝑧𝑙
𝑙 𝑙
• Therefore, the derivative with respect to 𝑜𝑗 can be calculated if all the derivatives
with respect to the outputs 𝑜𝑧𝑙 of the next layer – the one closer to the output
neuron – are known.
• Putting it all together:
𝜕𝐸
= 𝛿𝑗 𝑜𝑖
𝜕𝑤𝑖𝑗
With
𝑜𝑗 − 𝑡𝑗 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an output neuron
𝜕𝐸 𝜕𝑜𝑗
𝛿𝑗 = =
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 ෍ 𝛿𝑧𝑙 𝑤𝑗𝑙 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an inner neuron
𝑍
To update the weight 𝑤𝑖𝑗 using gradient descent, one must choose a learning rate 𝜂.
𝜕𝐸
∆𝑤𝑖𝑗 = −𝜂
𝜕𝑤𝑖𝑗
Backpropagation Algorithm
Thank You

You might also like