You are on page 1of 4

Introduction to Machine Learning (Lecture Notes)

Multi-layer Perceptron
Lecturer: Barnabas Poczos

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.

1 The Multi-layer Perceptron

1.1 Matlab demos

Matlab tutorials for neural network design:

nnd9sd % Steepest descent


nn9sdq % Steepest descent for quadratic

Character recognition with MLP:

appcr1

Struture of MLP:

Noise-free input: 26 different letters of size 7×5. Prediction errors:

1
2 Multi-layer Perceptron: Barnabas Poczos

1.2 An example and notations

Here we will always assume that the activation function is differentiable. This will allow us to optimize the
cost function with gradient descent. However, non-differentiable activation functions are getting popular as
well.

2 The back-propagation algorithm

2.1 The gradient of the error

The current error:


2 = 21 + 22 = (ŷ1 − y1 )2 + (ŷ2 − y2 )2 .
Multi-layer Perceptron: Barnabas Poczos 3

More generally:

NL
X NL
X
2
 = 2p = (ŷp − yp )2 .
p=1 p=1

We want to calculate
∂(k)2
=?.
∂Wijl (k)

2.2 Notation
• Wijl (k): At time step k, the strength of connection from neuron j on layer l − 1 to neuron i on layer l.
(i = 1, 2, ..., Nl , j = 1, 2, ..., Nl−1 )

• sli (k): The summed input of neuron i on layer l before applying the activation function f at time step
k(i = 1, ..., Nl ).

• xl (k) ∈ RNl−1 : The input of layer l at time step k.

• ŷ l (k) ∈ RNl : The output of layer l at time step k.

• N1 , N2 , ..., Nl , ..., NL : Number of neurons in layers 1, 2, ..., l, ..., L.

2.3 Some observations

xl = ŷ l−1 ∈ RNl−1
Nl−1 Nl−1
X X
sli = Wi·l ŷ l−1 = Wijl xlj = Wijl f (sl−1
j )
j=1 j=1

Nl
X
sl+1
j = l+1
Wji f (sli )
i=1

2.4 The back propagated error


∂ ∂ ∂g(x) ∂ ∂h(x)
Recall that ∂x f (g(x), h(x)) = ∂g f (g(x), h(x)) ∂x + ∂h f (g(x), h(x)) ∂x .

Introduce the notation:


NL
−∂2 (k) X ∂2p (k)
δil (k) = = −
∂sli (k) p=1
∂sli (k)

where i = 1, 2, ..., Nl .
As a special case, we have that
NL
X ∂(yp (k) − f (sL
p (k)))
2
δiL (k) = − = 2i (k)f 0 (sL
i (k))
p=1
∂sL
i (k)
4 Multi-layer Perceptron: Barnabas Poczos

Lemma 1 δil (k) can be calculated from {δ1l+1 (k), ..., δN


l+1
l+1
(k)} using backward recursion.

NL NL N
X ∂2p X Xl+1
∂2p ∂sl+1
j
δil (k) = − = −
p=1
∂sli p=1 j=1
∂sl+1
j
∂s l
i
Nl+1 NL
XX ∂2p l+1 0 l
= − Wji f (si )
j=1 p=1
∂sl+1
j

Therefore,
Nl+1
X
δil (k) = ( δjl+1 Wji
l+1
(k))f 0 (sli (k))
j=1

where δil (k) is the back propagated error.


Now using that
Nl−1
X
sli (k) = Wijl (k)xlj (k)
j=1

∂(k)2 ∂(k)2 ∂sli (k)


= = −δil (k)xlj (k)
∂Wijl (k) ∂sli (k) ∂Wijl (k)

The Back-propagation algorithm:

Wijl (k + 1) = Wijl (k) + µδil (k)xlj (k)

In vector form:
Wi·l (k + 1) = Wi·l (k) + µδil (k)xl· (k).

You might also like