You are on page 1of 38

SCHOOL OF TECHNOLOGY

SIMULATION AND MODELLING

Lecture 08

GAMING SIMULATION: Deep Discriminative models :

Multi layer Perceptron (MLP) Network

Advancing Knowledge, Driving Change | www.kca.ac.ke


1
Deep Discriminative Models
 Deep Discriminative models are deep learning models that ‌aims
at creating decision boundary between the classes and learns
the conditional probability distribution p(y|x).

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
2. Deep Discriminative Models
Examples of Discriminative Models
Multilayer perceptron Network
Convolutional Neural Networks
Recurrent Neural Networks such as Vanilla Recurrent Neural
Networks (Vanilla RNN) , Long-Short Term Memory Networks
(LSTMs) and Gated Recurrent Units (GRUs)
Deep Stacking Network
Time-Delay Neural Network
Hierarchical temporal memory(HTM)
Deep Sequential neural network

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Multilayer Perceptron
 Multilayer Perceptron (MLP) is a type of discriminative neural
network that is made up of several perceptrons and has one or
more hidden layers in between the input and the output layer.
 A number of neurons are connected in layers to build a multilayer
perceptron.
 Example, MLP with single hidden layer.

•  

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Advantages of Mult layer perceptron
1. a multi layer perceptron can also learn non – linear functions. This is not
possible with perceptron, not able to represent functions that are not linearly
separable

x2
x2
+
+
+ - + -
-
x1
x1
+ -
- +
-
Linearly separable
Non-Linearly separable

Advancing Knowledge, Driving Change | www.kca.ac.ke


5
Advancing Knowledge, Driving Change | www.kca.ac.ke
Back Propagation Artificial neural Network

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back Propagation
 Back Propagation algorithm is an example of Multilayer Perceptron
(MLPAs the algorithm applies, the adjustment of weights is done by
propagating backward from the output nodes to the inner nodes.

The goal of backpropagation is to optimize the weights to allow learning


Advancing Knowledge, Driving Change | www.kca.ac.ke

how inputs can be mapped to outputs.


Advancing Knowledge, Driving Change | www.kca.ac.ke
Bias

 Ina multilayer network, Bias unit is an "extra" neuron added to


each pre-output layer that stores the value of 1.

Bias units aren't connected to any previous layer and in this sense
don't represent a true "activity".
Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke
Bias

 Bias is used to adjust the output along with the weighted sum of
the inputs to the neuron.
It is an additional parameter in the Neural Network
 Example
From this diagram, Bias of 1.0 has been added as a constant value.

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Bias

 Bias is like the intercept added in a linear equation, Y=mx+c,


 since it will also be the output when the input is zero.
 Where, m = weight and c = bias
 If the bias is absent , model(Y=mx) will train over point passing
through origin only, which is not in accordance with real-world
scenario. Also with the introduction of bias, the model will
become more flexible.

Therefore Bias
Advancing is aDriving
Knowledge, constant which helps the model in a way that it can fit best for the
Change | www.kca.ac.ke

given data. Advancing Knowledge, Driving Change | www.kca.ac.ke


Bias
 The change in bias shifts the activation function to the left or to the
right e.g. Increasing bias by 5 shifts the function to the left by 5 units
Example
Bias helps in
controlling the
value at which
activation
function will
trigger

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Bias

 Bias is used to delay the triggering of the activation function.


 Example:
 Suppose an activation function act() which get triggered on some
input greater than 0 and input1 = 1, weight1 = 2, input2 = 2,
weight2 = 2
 Net total input = input1*weight1 + input2*weight2output = 6
 Since the net output>0 then activation function will get triggered
to output = 1
 If a bias=-6 is introduced the net total input becomes 0.
 (1*2) + (2*2) +(-6)= 0.
 Therefore, the activation function will not trigger.

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Effect of adjusting Weights

 Change in weight adjusts the speed of learning . That is, makes the
activation function steeper or flatter
 Example, Suppose we increase weight as follows:
 Weight1 changed from 1.0 to 4.0 and weight 2 from -0.5 to 1.5

It can be inferred that


Increase in weight will
increases the number of
Triggering or the speed of
output
Triggering.

Decrease in weight will


delay triggering

Advancing Knowledge, Driving Change | www.kca.ac.ke

input
Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation Neural network
 Back propagation Algorithm has four main stages:
1. Initialization of weights
2. Feed forward- each input unit(X) receives an input signal and transmits this signal
to each of the hidden units Z1, Z2, Z3….., Zn,
 Each hidden unit then calculates the activation function and sends its signs Zi
to each output unit.
 The output unit calculates the activation function to form the response of the
given input pattern.
3. Calculate Back propagation of errors-
 Each output unit compares activation Y , with its target value T , to determine
1 K
the associated error for that unit.
 Based on the error, the factor is computed and is used to distribute the error

at output unit Yk back to all units in the previous layer.


Similarly, the factor is compared for each unit Zj
4. Updating weights and biases
Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
 Example
 Consider the following network

Given that input1 =0.05 and input2 =0.10, b1=1 and b2=1
Train the above network to output 0.01 and 0.99
Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm

 Step1: Guess the initial Weights:


 Layer 1 weights: w1=0.15, w2=0.20, w3=0.25, w4 =0.30
 Layer 2 weights: w5= 0.40, w6=0.45, w7=0.50, w8=0.55

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
 Step2: Forward Pass
 i) .Calculate the total net input for hidden input 1(h1)
Net input =( input1x w1)+(input2x w2) +(bias*0.35)
h1 = (0.05x 0.15)+(0.10x0.20)+(0.35x1) =0.3775

ii). Calculate activation e.g. using logistic function

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
Step2: Forward Pass
iii) .Calculate the total net input for hidden input 2(h2)
Net input =( input1x w1)+(input2x w2) +(bias*1)
h2 input = (0.05x 0.25)+(0.10x0.30)+(0.35x1) =0.3925

iv). Calculate activation(output) e.g. using logistic function

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
 Step2: Forward Pass
iii) .Calculate the total net input for output layer(O)
Net input =( Out h1x w5)+(out h2x w6) +(b2*1)
h2 input = (0.593x 0.40)+(0.0.596x0.45)+(0.60x1) =1.1059

iv). Calculate activation(output) e.g. using logistic function

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
Step2: Forward Pass
iii) .Calculate the sum of net input for output layer(O)
Net input =( Out h1x w5)+(out h2x w6) +(b2*1)
Net input o1 = (0.593x 0.40)+(0.0.596x0.45)+(0.60x1) =1.1059
Net input o2 = (0.593x 0.50)+(0.0.596x0.55)+(0.60x1) =1.2243
iv). Calculate activation(output) e.g. using logistic function

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
 Step 3: Calculate Total Error
 Each error can be calculated using squared error function.

For example,
 The target output for is 0.01 but the neural network output
0.75136507, therefore its error is  

Similarly, the target output for o2 = 0.99 but the neural network output= 0.77 82
Therefore the error is: 

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm

 Step 3: Calculate Total Error

 All the errors are then added together to obtain the total error:
Example

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Back propagation algorithm
 Step 4: Backward pass
 In this step each weight is adjusted weights by propagating
backward from the output nodes to the inner nodes.
 The aim of adjusting is to cause the actual output to be closer the
target output, thereby minimizing the error for each output neuron
and the network as a whole.
 Chain rule is used to adjust the weights

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Step a) Calculate change in weights by a certain learning rule.
 For example, chain rule can be applied.
 It
states that derivative of y with respect to x = 
 A derivative is a function which measures the slope.
 In neural networks, it used to used calculate change in weights
 For Example , the change in w5 can be calculated as follows:

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Step a) Calculate change in weights by a certain learning rule.
 Instead of calculating the derivative of how a specific weight affects
the cost directly, we can instead calculate these:
dError/dOutput :The derivative of how a neuron’s output
affects total error
dOutput/dWeightedInput: The derivative of how the net
input of a neuron affects a neuron’s output
dWeightedInput/dWeight: The derivative of how a
weight affects the weighted input of a neuron
 Finally, all the three derivatives are used to get change in weight
as follows:

dError/dWeight5 =
Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Step a) Calculate change in weights
(i) Calculate the derivative of the total error change with respect to the output as
follows:

(ii) Calculate the derivative of how the net input of a neuron affects a neuron’s output
as follows:

(iii) Calculate the derivative of how a weight affects the weighted input of a neuron.

(iii) finally, Calculate the partial derivative of with respect to weight, e.g. w 5:

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Step a) Calculate change in weights .
 Delta rule can also be applied to calculate change in weights as
follows:

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 b) Calculate the new weight by subtracting change in weight from
the previous weight.
 For example
 The new W5 can be calculated as follows:

 Where:
 n= predefined learning rate e.g. 0.5

Partial derivative of total error with respect to weight5

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Step 4 : Backward pass can be repeated to get the new weights
 for w6,w7 and w8

The neural network is updated after getting new weights leading into the
hidden layer neurons so that they can be used to calculate new weights of the
preceding using backpropagation algorithm.
0.35
0.4

0.51

0.56

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Calculate weights of hidden layer by repeating backward pass.
 but slightly different to account for the fact that the output of each
hidden layer neuron contributes to the output (and therefore error)
of multiple output neurons.

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 The derivative of total error with respect to w1 can be calculated as
follows:

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
Derivative of total error with respect to w1

dEo1/dOuth1= dEo1/dNetInput01 * dNetInputo1/dOuth1


dEo1/dNetInput01 = dEo1/dOut01 * dOut01/ dNetInput01
But:
dEo1/dOut01 = -(Target-Outo1) = - (0.01-0.75136) = 74136
dOut01/ dNetInput01 =Outo1(1-Outo1) =0.75136(1-0.75136)=0.18681
Therefore,
dEo1/dNetInput01 = 0.741136 * 0.18681 = 0.138444

dNetInputo1/dOuth1=w5= 0.40

Hence, dEo1/dOuth1 = 0.138444 * 0.40 = 0.05537

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass

Derivative of total error with respect to w1

Similarly, dEo2/douth2Can be calculated as follows

dEo2/dOuth1= dEo2/dNetInput02 * dNetInputo2/dOuth1


dEo2/dNetInput02 = dEo2/dOut02 * dOut02/ dNetInput02
But:
dEo2/dOut02 = -(Target-Outo1) = - (0.99-0.7732) = - 0.2168
dOut02/ dNetInput02 =Outo2(1-Outo2) =0.7732(1-0.7732)=0.17536
Therefore,
dEo2/dNetInput02 = -0.2168 * 0.17536 = -0.03801

dNetInputo2/dOuth1=w7= 0.51

Hence, dEo2/dOuth1 = -0.03801 * 0.51 = - 0.0193851

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
Derivative of total error with respect to w1

Finally, the two values can be put together as follows:


dEo1/dOuth1= 0.05537
dEo2/dOuth1= - 0.0193851

dEtotal/Douth1 = dEo1/douth1 +dEo2/Outh1


= 0.05537 -0.0193851
= 0.035
Therefore,
dEtotal/dw1 = dEtotal/dOuth1 * dOuth1/dneth1 * dneth1/dnetw1
but,
dEtotal/Douth1 =0.035
dOuth1/dneth1 = Outh1(1-Outh1) = 0.59326(1-0.59326)=0.24130
dneth1/dnetw1 = i1= 0.05

Hence,
dEtotal/dw1 = 0.035* 0.24130 * 0.05 =0.0004222

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 New weight 1 (w1) is calculated as follows:
 New W1= old W1 - n * dEtotal/dw1
 where :
 n= learning rate = 0.5

Therefore,
new W1 = 0.15- 0.5 * 0.0004222 = 0.149889

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Step 4: Backward pass
 Backward pass can be repeated to get the following new weights of
hidden layer
 W2 = 0.199956, W3 =0.24975, W4 = 0.299950

The update network is as follows:


0.15 0.35
0.4
0.19

0.24 0.51

0.29
0.56

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
References

 https://medium.com/towards-artificial-intelligence/
understanding-back-propagation-in-an-easier-way-you-never-
before-42fe26d44a47

 https://stevenmiller888.github.io/mind-how-to-build-a-neural-
network/.

 https://hmkcode.com/ai/backpropagation-step-by-step/

Advancing Knowledge, Driving Change | www.kca.ac.ke


Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke
Advancing Knowledge, Driving Change | www.kca.ac.ke

You might also like