You are on page 1of 4

Multilayer Perceptron 08.09.

2021

Multilayer Perceptrons or MLP's as they are popularly known are perceptrons,


organized in multiple layers to constitute what are popularly known as multilayer
feedforward networks.

Typically, the network consists of a set of sensory units (source nodes/input nodes)
that constitute the input layer, one or more hidden layers of computation nodes,
and an output layer of computation nodes.

The input signal propagates through the network in a forward direction, on a layer by
layer basis. These networks are collectively known as multilayer perceptrons (MLPs),
which represent a generalization of the single-layer perceptron.

[Source: javatpoint.com

Note: The nodes of the Hidden Layer's and the Output Layer are computation nodes,
while the input layer nodes are propagational nodes.

x w
i i

w
x j
j
A multilayer perceptron has three distinctive characteristics:

1. The model of each computational neuron in the network includes a nonlinear


activation function. The nonlinearity should be smooth, i.e. differentiable
everywhere. The nonlinearity is important, as otherwise the input-output relation
would be reduced to linear transformation as in single layer networks.

2. The network contains one or more layers of hidden neurons, that are not part of
the input or output of the network. These hidden neurons enable the network to
learn complex tasks by extracting progressively more meaningful features from the
input patterns.

3. The network exhibits a high degree of connectivity, wherein every node of a layer
is connected with every other node of the following layer. A change in connectivity
needs a change in the population of synaptic connections or their weights.

The combination of these features, along with the ability of the network to learn from
experience through training, that the MLP derives its computing power.

The same features also however contribute to its deficiencies:

1. The distributed nonlinearity and high connectivity makes theoretical analysis


difficult to undertake.

2. Hidden neurons makes the learning process harder to visualize.

I Hidden
L OL
n Layer
a ua
p t y
Input y Actual Output
u pe
e
t ur
r
t
<Target Output>

Target Output - Actual Output = Error

Learn from Error


Gradient Descent

A - (1,1.75)
B - (1.5, 2.25)
C - (3, 4.8)

Weight
Weight = Intercept + Slope * Height
Let slope = 0.5, and intercept = 0

Height

Predicted weight = 0 + 0.5 * Height

Using this, let us compute the weights:


1. PW1 = 0 + 0.5 * 1 = 0.5
Error = (1.75 - 0.5) = 1.25
Squared Error = (1.25)* (1.25) = 1.5625

2. PW2 = 0 + 0.5 * 1.5 = 0.75


Error = (2.25 - 0.75) = 1.5
Squared Error = (1.5) * (1.5) = 2.25

3. PW3 = 0 + 0.5 * 3 = 1.5


Error = (4.8 - 1.5) = 3.3
Squared Error = (3.3) * (3.3) = 10.89

Sum of squared errors (Loss Function) = 1.5625 + 2.25 + 10.89 = 14.7025

Loss Function = (Actual Weight - Predicted Weight)^2

= (Actual Weight - (I + Slope x Height))^2

Sum of
squared
errors

Intercept
d(SSR) = d(1.75 - (I+0.5*1))^2 + d(2.25 - (I + 0.5*1.5))^2 + d(4.8 - (I + 0.5*3))^2
dI dI dI dI

= -2(1.75 - (I+0.5*1)) - 2(2.25 -(I+0.5*1.5)) - 2(4.8 - (I+0.5*3))

Putting I = 0;
Slope = -2.5 - 3 - 6.6 = -12.1

Step Size = Slope * Learning Rate


Let, Learning Rate = 0.1

Step Size = -12.1 * 0.1 = -1.21

New Intercept = Old Intercept - Step Size


= 0 - (-1.21)
= 1.21

Gradient Descent - Step by step

youtube.com/watch?v=SDv4f4s2SB8

You might also like