Professional Documents
Culture Documents
ON
SUBMITTED BY:
”CHHAVI SHARMA”
ASSOCIATE PROFESSOR
DEPARTMENT OF MATHMATICS
1
CONTENTS:
• Introduction
• Activation functions
• Network Architecture
• MATLAB Code
• Conclusion
• References
2
1 Introduction
Network consists of nodes which are interconnected with each other to form a large circuit.In our brain
we have a biological neural network.The basic processing element of our brain is neuron.Around 10
billion neurons interconnected with each other to form a biological neural network.
In an analogy to the brain, an entity made up of interconnected neurons, neural networks are made
up of interconnected processing elements called units, which respond in parallel to a set of input signals
given to each. The unit is the equivalent of its brain counterpart, the neuron.
A neural network consists of four main parts:
1. Processing units uj , where each uj has a certain activation level aj (t) at any point in time.
2. Weighted interconnections between the various processing units which determine how the activa-
tion of one unit leads to input for another unit.
3. An activation rule which acts on the set of input signals at a unit to produce a new output signal,
or activation.
4. Optionally, a learning rule that specifies how to adjust the weights for a given input/output pair.
• A biological neuron has different components like dendrites,soma ,nucleus, axon and terminal
buttons.
• Dendrites:It accepts the input signals and fed these signals to the nucleus inside the soma.
• Axon:It accepts the input signals from the nucleus and transmits them to the other interconnected
neurons through terminal buttons.Axon behaves as the transmission line to transmits the signals
from one neuron to other neuron through terminal buttons on the axon.
3
3 Artificial Neural Network:
In 1943,Mculloch and Pitts gave a mathmatical model based on the activity of a biological neu-
ron.This model is known as artificial neuron and when the large number of artificial neurons
interconnected with each other they form an artificial neural network.The idea of this artificial
neural network was taken from the activity and structure of biological neural network.A biological
neural network may be modelled artificially to perform computation and then the model is termed
as artificial neuron. Hence, we can say that,
• Artificial Neural Networks are the programs designed to solve any problem by trying to mimic the
structure and the function of our nervous system.
• Neural Networks are based on simulated neurons, which are joined together in a variety of ways
to form networks.
• Neural Network resembles the human brain in the following two ways:
- A neural network acquires knowledge through learning.
- A neural network’s knowledge is stored within interconnection weights known as synaptic
weights.
Figure 2: An artificial neural network in which each node represents neuron and there is a hidden layer
between input layer and output layer.Arrow represents that the output of the input layer is sent as input
to the hidden layer and output of the hidden layer is sent as input to the output layer neurons.
• Artificial Neuron:It is the basic processing element of artificial neural network.A neuron accepts
inputs from one or more number of neurons and produces only one output.This output is trans-
mitted to the other neurons through synapses(a junction which is used to sent the signals from
one neuron to other neuron).This output is obtained when the weighted sum of signals multiplied
by the corresponding synaptic weights is activated by a function which is known as activation
function or threshold function.
The connection weights or synaptic weights indicates the strength of the signals.
4
4 Mathmatical form of Artificial Neural Network:
Pn
Oj = f( i=1 wji xi + wj0 ) s.t
Pn
i=1 wji xi + wj0 ≥ θ
5 Activation Functions:
Activation function of a neuron is a nonlinear mathmatical function.There are some examples
like sigmoid function,hardlimiter function,step function,tanh function,gaussian activation function
etc.The choosen activation function depends upon the type of problem we are solving.
• Sigmoid function:It is a s-shaped nonlinear mathmatical function also known as squashing or
logistic function.Mathmatically,
1
σ(x) = 1+e−x
, 0 < σ(x) < 1 and −∞ < x < ∞
5
Figure 6: Sigmoid function
h(x) = {0 ;if x ≤ 0
= 1; otherwise
−(x−m)2
f(x) = √1 e 2(σ)2 , where m and σ are the peaks and width respectively.
2Πσ
Different activation functions are used depend upon the type of problem we are considering.Out of
the above explained activation functions sigmoid function and hardlimiter function are most com-
monly used.Sigmoid function is popular because it is a non-linear,monotonic and bounded function
0
and has a simple derivative f (s) = kf(s)(1 - f(s)).Hardlimiter function is also monotonic,bounded
but linear.The lower and upper bounds on the choosen activation function depends on the user
and the area of applications.
6 Network architecture:
• It gives the structure of a network how different nodes are interconnected with each other.In case
of neural network ,it defines the structure of neural network and refers to the way how different
neurons are connected and it is the important part of network functioning and learning.
• There are some network architecture in literature called neural network topology like instar topol-
ogy ,outstar topology,group of instars etc.
• In instar topology the neuron of the present layer accepts the output of each neuron from the
previous layer.
• In outstar topology the neuron of the present layer accepts the output of a neuron from the previous
layer.
6
7 Learning:
• Learning is a process to acquire knowledge from the past observations and examples.
• All the knowledge of neural network is stored in synapses as the connection weights or synaptic
weights.There is a matrix between two consecutive neuron layer which is called weight matrix.Once
this knowledge is acquired through learning then presenting a pattern for input to the network
will produce the correct output.
• Therefore,learning is a process in which a neural network update its parameters and weights in
response to input so that the actual output converges to the desired output.Once we get the actual
output closest to the desired output ,our learning phase gets completed and we acquired knowledge.
1.Supervised Learning: In this process we compare the actual output with the target output
and if there is some error signal then a teaching signal is sent to the neural network to update the
connection weights.These teaching signals are also known as teaching sample.
2.Unsupervised Learning: This learning algorithm is unlike the supervised learning.In this
there is no teaching signal is fed to the network but some guidelines are sent to it to adjust the
parameters so that the error signal is minimized.
3.Reinforced Learning: This Learning requires one or more neurons at the ouput layer.In this
learning there is a teacher (training sample) unlike in supervised learning which indicates how the
actual output is closest to the target output.The error signal generated by the teacher is only a
binary like pass/fail,true/false,0/1,-1/1 etc.If the error signal generated by the teacher is fail,it
means there is a requirement to re-adjusts the weight parameters untill the result by the teacher
is pass.
4.Competitive Learning:This learning method requires one or more neurons at the output
layer.In this learning method different output layer neurons compete with each other to give the
output closest to the desired output.The network output corresponding to the applied input signal
becomes dominant one and the remaining outputs from the other neurons cease producing an
output signal for that input signal.
• In this learning rule , a neuron learns by shifting its weights from inactive connection to active ones.
• Only the winning neuron and its neighborhood are allowed to learn.
• If a neuron does not respond to a given input pattern then learning can not occur in that particular
neuron.
Mathmatically,the winning rule is defined as:
yk = { 1; if vk > vj for all j, j 6= k
0; otherwise
Pn
j=1 Wkj = 1, ∀ k
The neuron corresponding to which yk = 0 is looser and the neuron corresponding to which yk =
1 is the winner.
NOTE: There is another method to find the winning neuron which is described below:
• The overall effect of the competitive learning rule resides in moving the synaptic weight vector
Wkj of winning neuron k towards the input vector X.
• The matching criteria is equivalent to the minimum euclidean distance between vectors.
7
• The Euclidean distance between a pair of n by 1 vectors X and Wkj is defined by:
1
dk = k X − Wkj k = [ nj=1 (xj − wkj )] 2
P
where xj and wkj are the jth elements of vector X and vector Wkj and n be the number of input
neurons to the kth neuron.
• To identify the winning neuron, kX , that best matches the input vector X, we may apply the
following condition:
kX = mink k X − Wkj k ,k = 1,2,.....,m
The competitive learning rule defines the change ∆Wkj applied to the synaptic weight Wkj
∆Wkj = { η (xj − Wkj ) ; if neuron k wins
0 ; if neuron k looses
x = [x1 , x2 , ......, xm ]
v2 = 0.3024
v3 = x1 × W31 + x2 × W32
v3 = 0.2488
Suppose the activation function is logistic function
y1 = f1 (v1 ) = 1+e−10.2376 = 0.559122
1
y2 = f2 (v2 ) = 1+e− 0.3024
= 0.57503
1
y3 = f3 (v3 ) = 1+e− 0.2488
= 0.56188
Now calculate the distance between the input vectors and corresponding weight vectors.
8
p
d1 = p(x1 − W11 )2 + (x2 − W12 )2
d1 = (0.52 − 0.27)2 + (0.12 − 0.81)2
d1 = 0.73389
p
Similarly d2 = (x1 − W21 )2 + (x2 − W22 )2
p
d2 = (0.52 − 0.42)2 + (0.12 − 0.70)2
d2 = 0.58856
p
d3 = (x1 − W31 )2 + (x2 − W32 )2
p
d3 = (0.52 − 0.43)2 + (0.12 − 0.21)2
d3 = 0.1140
So, out of d1 , d2 and d3 , d3 is minimum.So neurom 3 is the winner and by competitive learning
equation weight change corresponding to connection 1−→ 3 and 2 −→ 3 are given as;
and connection weight corresponding to other connections remain unchanged because weight
change ∆Wkj corresponding to looser neurons is zero. Now new synaptic weight matrix is given by;
0.27 0.42 0.43 + 0.09
0.81 0.70 0.21 − 0.01
0.27 0.42 0.52
0.81 0.70 0.20
5.Hebbian Learning:This learning method involves the updation of connection weights.There are
some charecterstics of Hebbian synapses:
(a) Time Dependent,it means that when we have two neurons one is pre-synaptic neuron and other is
post synaptic neuron then activation of these neurons occur simultaneously.
(b) Local,it means Hebbian synapses involves only pre-synaptic and post-synaptic neurons and
doesn’t care about the other neurons.
9
Hebb’s hypothesis is given by,
By keeping η and xj constant,it is the equation of straight line passing through origin with slope ηxj
• To resolve this problem,we might impose a limit on the growth of synaptic weights.It can be done
by introducing a non-linear forgetting factor into Hebb’s Law:
• Forgetting factor usually falls in the interval between 0 and 1 ,typically between 0.01 and 0.1,to
allow only a little ’forgetting’ while limiting the weight growth.
• Set initial synaptic weights and thresholds to small random values,say in an interval [0,1].
Step 2:Activation.
Compute the neuron activation at iteration n
Pn
yk = f( j=1 Wkj xj + w0 ), where n is the number of input neurons
Step 3: Learning.
10
∆Wkj (n) = ηyk (n)xj (n) - ϕyk (n)Wkj (n)
Step 4:Iteration.
1. Backpropagation algorithm has three layers input layer ,hidden layer,and output layer.
2. There is no connection between neurons within the layer but two consecutive layers are fully
connected with each other.
3. There are two synaptic weight matrices one is corresponding to input layer and hidden layer and
other is corresponding to hidden layer and output layer.
4. Backpropagation is a mathmatical tool widely used as learning algorithm in feed forward multi-
layer neural network.
5. It is difficult to find the errors in hidden layer neurons ,there is no direct way to find the errors of
hidden layer neurons.
6. Before starting the backpropagation learning procedure , we should know the following things:
• The set of normalised training patterns (sample or data) , both inputs {Xk } and corresponding
target {Tk }.
• A value of learning rate η.
• A criteria for the termination of algorithm.
• A methodology for the updation of weights i.e we should have a rule that how we have to
update the connection weights.
• An activation function, usually non linear sigmoid activation function is preferred to get the
activation of signals.
• Initial synaptic weight matrix whose entries are just the random numbers between -0.5 and
0.5.
3. Input to the input layer causes a response to the next layer i.e input to the present layer is the
output obtained from the previous layer neurons.
11
4. Calculate the outputs from all the layers between input layer and output layer.
5. Calculate the actual output of output layer neurons and compare it with the target output and
check whether there is a error signal or not.If there is some error then calculate the rate of change
of errors with respect to the connection weights in the forward direction of the network flow i.e
δe
δWij
, Wij = connection weight.
6. Now update the weights between the output layer and last hidden layer.
7. Again calculate the rate of change of errors with respect to the connection weights between the
last two hidden layers.Then update the weights between the last two hidden layers.
9. Repeat from step 2 to step 8 for all input data set i.e repeat the above steps untill the end of the
input data set.
10. Now Check the training status of the network by calculating total errors corresponding to all
training patterns (i.e input data set).If training status is not satisfactory then adjust the training
parameters like learning rate ,momentum factor etc.
11. If the status is not satisfactory by changing the training patterns then repeat the above whole
procedure with new initialisation weight matrix .
where tj and yj are the target output and actual output of jth neuron respectively at nth iteration.
1
e2j (n)
P
E(n) = 2 j∈C
(2)
Average error energy is given by
1
PN
Eav (n) = N
( n=1 En (n))
(3)
Pm
vj = i=0 Wji (n)yi (n)
(4)
12
(5)
where φ() is the activation function.
Now rate of change of errors with respect to the connection weights Wji is given by δE(n)
δWji
.The weight
updation δWji is directly proportional to this rate of change of error i.e
δWji = -η δE(n)
δWji
, where η is the learning rate and minus sign is present because we are updating the
weights in backward direction.
δE(n) δE(n) δej (n) δyj (n) δvj (n)
δWji (n)
= . . .
δej (n) δyj (n) δvj (n) δWji (n)
where
δE(n)
δej (n)
= ej (n) ,
δej (n)
δyj (n)
= -1
δyj (n) 0
δvj (n)
= φ (vj (n))
δvj (n)
δWji (n)
= yi (n)
δE(n) 0
δWji
= -ej φ (vj (n))yi (n)
0
So, ∆Wji = -ηej φ (vj (n))yi (n)
0
let δj (n) = ej φ (vj (n))
δE(n) δE(n) δej (n) δyj (n)
δj (n) = - vj (n)
= . .
δej (n) δyj (n) δvj (n)
0
ej φ (vj (n))
(6)
13
δE(n) δE(n) 0
δj (n) = - vj (n)
= - δyj (n)
φ (vj (n))
(8)
1
e2k (n)
P
E(n) = 2 k∈C
(9)
C is the set corresponding to output layer neurons whose cardinalty is equal to the number of neurons
in the output layer. P
δE(n) δE(n) δek (n)
δyj (n)
= δe .
k (n) δyj (n)
= k ek (n) δek (n)
δyj (n)
(10)
δek (n) 0
δvk (n)
= -φ (vk (n))
(12)
Pm
For neuron k, vk (n) = j=0 Wkj (n)yj (n)
δvk (n)
yj (n)
= Wkj (n)
(13)
0 P
δj (n) = φ (vj (n)). k δk (n)Wkj (n)
(14)
14
9.4 Practical considerations in Backpropagation algorithm:
0
(a). Since δj (n) depends on φj (vj ) so it should be differentiable.
Let the activation function φ() is logistic function then
1
yj (n) = φj (vj (n)) = 1+exp(−avj (n))
, a > 0 , ’a’ gives the slope of the function and −∞ < vj (n) < ∞
0 aexp(−avj (n))
φj (vj (n)) = [1+exp(−avj (n))]2
= ayj (n)[1 − yj (n)]
(15)
P
δj (n) = ayj (n)[1 − yj (n)] k δk (n)Wkj (n)
(16)
(17)
where α is a positive number or momentum.It is given name momentum in the sense that whenever η
is small , learning will occur , so we are pushing or giving force to occur this phenomenon quickly.
Pn
∆Wji (n) = η t=0 αn−t δj (t)yi (t) because αn .∆Wji (0) = 0 because we are not changing the weights
15
at 0th iteration.
Pn δE(t)
= -η t=0 αn−t W ji (t)
0 ≤| α |< 1 because if α > 1 then this time sereis doesn’t have any limit, all terms will be very large
which we don’t want to acheive.So, ≤| α |< 1
δE(t)
1. If Wji (t)
has the same sign for all t, | ∆Wji | grows in magnitude . so , it is accelerated descent.
δE(t)
2. If Wji (t)
alternates its sign in every iteration then | ∆Wji | is small in magnitude.
who = A weight matrix [who ]nh ×no for hidden to output layer
F(x) = 1+e1−x , an activation function
1
Pnh
o = F(hwho ) , where ok = F(bk ) = 1+e−bk
and ok = l=1 wkl hl for k = 1,2,........,no
Step 3: Compute the output layer error from equation (15) δj (n) = a[tj (n) − yj (n)]yj (n)[1 − yj (n)]
Since δj (n) is directly proportional to the error [tj (n) − yj (n)] so we will try to minimize δj (n) , so
let
dk = a[tj (n) − yj (n)]yj (n)[1 − yj (n)] where dk = error in output layer neuron k , and tk = target
output of the kth neuron in the output layer.
Step 4:
Compute the hidden layer output from the equation (16),
Pno ho
ek = hk (1 − hk ) i=1 wik di for k = 1,2,.......,nh
16
e = (e1 , e2 , ......., enh )
Step 5: Now update the weights of the connections between hidden and output layer by the formula:
Wji (n + 1) = Wji (n) + ∆Wji (n)
where ∆Wji (n) is given by the equation (17) ∆Wji (n) = α∆Wji (n − 1) + ηδj (n)yi (n) and δj (n) is
calculated from equation (15).
Step 6: Now update the weights of connections between input layer and hidden layer by the same
expression as above except the value of δj (n) is calculated from equation (16).
ni = 2,
nh = 2,
no = 1,
η = 0.6,
Initial weight matrices corresponding to input-hidden layer and hidden-output layer are given by
wih (0) =
ih ih
w11 (0) w21 (0)
ih ih
w12 (0) w22 (0)
0.1 −0.3
0.3 0.4
and who (0) =
ho
w11 (0)
ho
w12 (0)
0.4
0.5
Note the initial wih (0) and wih (0) are randomly choosen between -0.5 and 0.5.
Step 1: Calculate hidden layer activation as follows:
1
Now h1 (0) = F(a1 ) = 1+e−0.2
= 0.5498
1
h2 (0) = F(a2 ) = 1+e−0.18
= 0.54488
1
o1 (0) = F(b1 ) = 1+e−0.49236
= 0.6206
17
d1 = o1 (0)(1 − o1 (0))(t1 − o1 ) = 0.6206(1 - 0.6206)(0.7 - 0.6206) = 0.0187
Step 5: ∆wji = ηδj (n)yi (n) , where δj (n) = d(0) for output layer neuron j
ho
Therefore , ∆w11 = 0.9 × 0.0187 × 0.5498 = 0.0092531
ho
∆w12 = 0.9 × 0.0187 × 0.54488 = 0.0091703
ho ho ho
Therefore , w11 (1) = w11 (0) + ∆w11 (0) = 0.4 + 0.0092531 = 0.4092531
ho ho ho
w12 (1) = w12 (0) + ∆w12 (0) = 0.5 + 0.0091703 = 0.50917
∆wji (0) = ηδj (0)yi (0) , where δj (0) = ej (0) for hidden layer neuron j
ih
∆w11 (0) = 0.9 × 0.00185 × 0.2 = 0.00033
ih
∆w12 (0) = 0.9 × 0.00185 × 0.6 = 0.001
ih
∆w21 (0) = 0.9 × 0.00232 × 0.2 = 0.00042
ih
∆w22 (0) = 0.9 × 0.00232 × 0.6 = 0.00125
i.e W ih (1) =
0.10033 −0.29958
0.301 0.40125
So, we have completed one iteration , now repeat the whole procedure i.e repeat from step 1 to step
6 by considering the new connection weight matrices.As the iteration increases the error between the
computed output and the target reduces i.e , o(j) −→ t as j −→ ∞.
18
that can be used in any subsequent calculation.While the solution obtained by other techniques
like Runge Kutta gives discrete solution or a solution of limited differentiability.
• The neural network based method to solve differential equations provide solution with good gen-
eralisation.
• Complexity to solve ODE’s using neural network method does not increase quickly as the number
of sample points increase while in numerical methods complexity increases very rapidly as the
number of sampling points increase in the interval.
• This method is general and can be applied in orthogonal box boundaries or a body having irregular
boundries.
• This method tackles difficult differential equations arising in many engineering problems which are
even difficult to solve using numerical methods.Neural Network tackles many difficult problems
arising in real life.
Neural network can solve both ordinary and partial differential equations.This method to solve ODE’s
employs a feedforward neural network as a basic element which is known as error approximation function
and it is to be minimized by updating the parameters (i.e training the neural network).
We consider a trial solution of the given differential equation which is the sum of the two terms one is
independent of parameters and other is dependent on the parameters.This parameter dependent term
employs feed forward neural network in which network is trained to get the approximated solution of
given differential equation.We will study all these things later in detail.
G(→
−
x , ψ(→
−
x ), ∇ψ(→
−
x ), ∇2 ψ(→
−
x )) = 0, →
−
x ∈D
(18)
is the general form of ordinary differential equation where →−
x = (x1 , x2 , ....., xn ) ∈ Rn and D ⊂ Rn is
the input vector , D is the domain of defination , ψ is the dependent variable and
Pn ∂
∇= i=1 ∂ −
→
xi
∂2
Pn
∇2 = i=1 ∂x2i ,
Assumption:Domain of defination is orthogonal box.It means that no product of two xi ’s appear to-
gether.
11.2.1 Transformation
In this section we will transform above differential equation (18) into another one whose domain of
defination is a discrete set.
So,discretize the domain of defination D and its boundary S into discrete sets D
b and S.So
b equation
(18) can be transformed into
G(→
−
xt , ψ(→
−
xt ), ∇ψ(→
−
xt ), ∇2 ψ(→
−
xt )) = 0, ∀→
−
xt ∈ D
b
(19)
subject to some constraints which are just the initial conditions given to us in question.
Here constraints means the restrictions which are imposed on the solution of given differential equa-
tion i.e we have to find such solution of given ODE such that it satisfies some initial conditions known
19
as constraints.
Let ψt (→
−
x ,→
−
p ) be a trial solution of transformed equation (19) where →
−
p is the adjustable parame-
ter which gives a feed forward neural network in which the parameters →
−
p are adjusted such that the term
G(→
−
xt , ψ(→
−
xt ), ∇ψ(→
−
xt ), ∇2 ψ(→
−
xt )) = 0, is minimum.
min−
→
P
−
→
− →
− →
− →
−
→ (G( xt , ψ( xt ), ∇ψ( xt ), ∇2 ψ( xt )))2
p xi ∈ D
(20)
• Trial function is the sum of two terms one is independent of the parameters and other is with
adjustable parameters.
ψt (→
−
x ) = A(→
−
x ) + F (→
−
x , N (→
−
x ,→
−
p ))
(21)
where,
ψt (→
−
x ) = Trial function
A(→
−x ) = Term which is independent of network parameters i.e this term does not contain any
adjustable parameter and satisfies boundary conditions.
F (→
−
x , N (→
−
x ,→
−
p )) = Term with adjustable parameter →
−
p and does not contribute to the boundary
conditions.
N (→
−
x ,→
−p ) = output of feed-forward neural network with parameters →
−
p and fed with n dimension
input vector →
−x.
→
−
x = input vector of dimension n.
• Optimization problem is the problem which contains objective function subject to some constraints
known as restrictions on the participating variables.
• Our goal is to find such values of independent variables which optimizes the objective function
(maximize or minimize depend on the requirement in the question) subject to satisfy the given
constraints.
20
• Trial function defined in equation (21) must satisfy the initial conditions by construction.Other
term F (→
−x , N (→
−
x ,→
−
p )) employs a feed-forward neural network in which various weight parameters
and biases of the neural network are to be adjusted to minimze the error defined in equation (20).
• So, originally the problem defined in equation (18) and (19) was constrained optimization problem
as the problem is defined subject to some initial conditions known as constraints.
• But here the problem is reduced to equation (20) which contains trial function which already
satisfies the initial conditions by such construction of trial function.So, it is known as unconstrained
optimization problem due to our choice of such trial function which satisfeis initial conditions.
• Minimization of equation (20) can be seen as the training process of neural network.
• The network parameter updation formula involves the gradient of network error with respect to
weight parameters.So we have to calculate all these things to get the error and weight updation
formula.
Consider a Multi Layer Perceptron neural network having the following parameters:
Here we will work on to calculate gradient with respect to input variables for single hidden layer.The
extension to the case of more than one hidden layer can be obtained accordingly.
vi = weight corresponding to the connection from hidden layer neuron i to the output layer,
wij = weight corresponding to the connection from jth input node to the ith hidden node,
Pn
where zi = j=1 wij xj+ui is the input to ith hidden node.
21
For a given input vector →
−
x = (x1 , x2 , ......., xn ), network output is given by
Ph 1
N= i=1 vi σi (zi ) , where σ(zi ) = 1+exp(−zi )
∂N
Ph
∂xj
= i=1 vi wij σ (1) (zi )
∂k N
Ph k (k)
∂xkj
= i=1 vi wij σ (zi )
(22)
∂ m1 +m2 +.......+mn N
Ph Qn mk
m m
∂x1 1 x2 2 ......xm n = i=1 vi ( k=1 wik )σ (m1 +m2 +.......+mn ) (zi )
n
(23)
So, network derivatives with respect to input parameters of any order can be calculated from
above equation (22).Weight parameters wij , vi , biases ui and sigmoid activation function σ(zi ) are
known to us.
• Network architecture for the computation of derivatives is same as the architecture of original
MLP neural network except the parameters and activation function are different.
• Activation function for MLP neural network for the computation of derivatives is just the derivative
of activation function (derivative of sigmoid function).
• And parameters for new MLP neural network (for derivative computaion) can be calculated from
the parameters of the old MLP (i.e MLP for function).Of course this is true only when the training
of MLP is over and for fixed network architecture.
Fixed network architecture means if the architecture of MLP for the function is n × h × 1 then
the architecture of MLP for the derivative of function with respect to any variables and order is
also n × h × 1.
Parameters of new MLP (for derivatives computation) can be calculated as follows using old MLP.
∂ m1 +m2 +.......+mn N
Ph Qn mk
Suppose network derivative is Ng = m m
∂x1 1 x2 2 ......xm n = i=1 vi ( k=1 wik )σ (m1 +m2 +.......+mn ) (zi ).
n
Now the gradient of Ng with respect to parameters vi , wij and ui is given as;
∂Ng Qn mk
∂vi
=( k=1 wik )σ (m1 +m2 +......+mn )
∂Ng Qn mk
∂ui
= vi ( k=1 wik )σ (m1 +m2 +......+mn +1)
∂Ng m −1 Q
= xj vi ( nk=1 wik
mk
)σ (m1 +m2 +......+mn +1) + vi mj wij j ( nk=1,k6=j wik
mk
)σ (m1 +m2 +......+mn )
Q
∂wij
22
Network Parameter Updation:
When we get the network derivatives with respect to network parameters then the network upda-
tion formula can be written as:
vi (t + 1) = vi (t) + α ∂N
∂vi
g
,
ui (t + 1) = ui (t) + β ∂N
∂ui
g
,
∂Ng
wij (t + 1) = wij (t) + γ ∂wij
dψ(x)
dx
= f (x, ψ)
(24)
with x∈ [0,1] and the initial condition is ψ(0) = A.
ψt (x) = A + xN(x,→
−
p ),
(25)
where N (x, →
−
p ) gives the output of feed-forward neural network with x as input and weight pa-
→
−
rameter p and
N (x, →
− Ph
p)= i=1 vi σ(zi ) , where σ(zi ) = wij xj + ui
Since ψt (x) is the solution of above ODE (24) so , it should satisfy the initial conditions by con-
struction.
−
→
dψt (x)
dx
= N(x,→
−
p ) + x dN (x,
dx
p)
(26)
23
Here x is not a vector because this is single dimension problem.So from equation (22) value of
dN (x,−
→
p)
dx
is given by ,
dN (x,−
→
p) Ph (1)
dx
= i=1 vi wi σi (zi ) (k = 1 and there is no j-suffix because x has only one dimension)
Therefore,
Ph
dψt (x)
vi σi (zi ) + x hi=1 vi wi σ (1) (zi )
P
dx
= i=1
(27)
dψt (x)
Now from equation (24) dx
= f (x, ψt (x))
dψt (x)
dx
- f (x, ψt (x)) = 0,
Now,we have to choose the network parameters for the set of input points x1 , x2 , ......, xn
where xi ∈ [0,1] ∀ i = 1,2,.....,n such that the term dψdx
t (x)
- f (x, ψt (x)) is equal to zero.
min− →
− Pn dψt (x)
|x=xj - f (xj , ψt (xj )))2
p E( p ) =
→
j=1 ( dx
So, here we have one objective function and there is no constraint because the trial function ψt (x)
automatically satisfies the initial conditions by construction.So we have optimization problem
without constraint which is known as unconstrained optimization problem.Let’s call the term E(→ − p)
as error of MLP feed-forward neural network and we want to choose such type of parameters which
gives the minimum value of error E(→ −p ).
dψt (x)
Put the value of dx
form eq. (27) and the value of ψt (x) from eq. (25) in above minimization
problem
→
− Pn Ph Ph (1) Ph 2
min−
p E( p ) = j=1 ( i=1 vi σi (zi ) + xj i=1 vi wi σi (zi ) − f (xj , A + xj ( i=1 vi σi (zi ))))
→
1
where σ(zi ) = 1+e−zi
e−zi
σ (1) (zi ) = (1+e−zi )2
= σi (zi )(1 − σi (zi ))
−e−zi (1−e−zi )
σ (2) (zi ) = (1+e−zi )3
E(→
− 2
p ) = k ( dψtdx(xk ) − f (xk , ψt (xk )))
P
2
E(→
− (1)
p ) = k ( hi=1 vi σi (zi ) + xk hi=1 vi wi σi (zi ) − f (xk , ψt (xk ))) .
P P P
(28)
24
Minimization of E(→−p ) gives the optimized values of weight parameters v,w and bias u.Optimized
values of weight parameters v,w and bias u gives the error minimum.
Ph Ph (1) 2
ek = ( i=1 vi σi (zi ) + xk i=1 vi wi σi (zi ) − f (xk , ψt (xk ))) .
(29)
∂ek (1) √
∂vi
= 2{σi + xk wi σi } ek
∂ek √
∂ui
= 2vi {σi1 + xk wi σi2 } ek
for k = 1,2,......,n
√ Ph Ph (1)
where ek = { i=1 vi σi (zi ) + xk i=1 vi wi σi (zi ) − f (xk , ψt (xk ))}
So, now we have calculated the derivatives of error with respect to network parameters.Now we
can follow any minimization technique like bakpropagation algorithm ,delta rule or quasi Newton
BFGS (Broyden,Fletcher,Goldfarb, and Shanno) etc to reduce error.Here we employed the sim-
plest but quite effective minimization technique called Delta rule which gives
∆vi = -η ∂e
∂vi
k
,
∂ek
∆wi = -η ∂w i
,
∆ui = -η ∂e
∂ui
k
,
for i = 1,2,.......,h
vi (t + 1) = vi (t) + ∆vi ,
wi (t + 1) = wi (t) + ∆wi ,
ui (t + 1) = ui (t) + ∆ui .
Thus we have an iterative process which will continue till the derivatives of error with respect to
network parameters and the error function E(→ −
p ) comes down to its minimum value.
The initial values taken by weight parameters and bias i.e vi (0), wi (0), ui (0) are randomly choosen
between -0.5 and 0.5.
25
11.5 Neural Networks for First-Order ODE:
Two types of neural networks are used to solve a first order ordinary differential equation.One of
them is training network which is used to train the network to find the optimized values of network
parameters.Once the network parameters are frozen then the training network is used to obtain
the solution of first order ODE.
Training Network consists of five layers one input layer , three hidden layer and one output layer.It
basically comprises of two MLP which are combined in parallel between the input part of second
hidden layer.The activation function of upper half of the network is just the sigmoid function and
activation function of lower half of the network is the derivative of sigmoid function.
The working principle of training network is described in the algorithm given below.
Input: Randomly selected number between 0 and 1 , an error limit. is that error limit that we
want to acheive such that our calculated error should be less than this error limit i.e we will repeat
our algorithm untill we get error ek < .
Choose initial values of weight parameters wi ,vi and ui between -0.5 and 0.5 for i = 1,2,....,h
−zj
1 e
and compute net1j = 1+e−zj
,net2j = wj 1+e−zj for lower and upper half of the network respectively
, for j = 1,2,.....,h
Ph dN
Ph
Compute N = j=1 vj net1j , dx
= j=1 vj net2j
dψ
Compute ψ = A + xk N , dx
= N + xk dN
dx
dψ
Step 7: Output Layer Compute ek = | dx
− f (xk , ψ) |
Step 8:Preparation
26
zi = wi xk + ui
1
σi = 1+e−zi
e−zi
σi1 = (1+e−zi )2
√ (1)
ek = { hi=1 vi σi (zi ) + xk hi=1 vi wi σi (zi ) − f (xk , ψ(xk ))}
P P
∂ek (1) √
∂vi
= 2{σi + xk wi σi } ek
Step 9: Updation
vi (t + 1) = vi (t) − η ∂e
∂vi
k
∂ek
wi (t + 1) = wi (t) − η ∂wi
ui (t + 1) = ui (t) − η ∂e
∂ui
k
Termination: Stop
It consists of four layers one input layer, two hidden layers and one output layer.The activation
function of the neuron in MLP is just the sigmoid function.
It is the solution computing process of first order ordinary differential equation by using the net-
work parameters from a trained neural network.
The working principle of solution network is given below:
Algorithm
Input: Get a randomly generated x ∈ [0,1] and is an error limit.
27
for j = 1:h
zj = wj xk + uj ;
net1j = 1+e1−zj ;
end
Ph
Step 4: Second Hidden Layer Compute N = j=1 vj net1j
Step 6: Print ψ
Step 7: Termination Stop
11.6 Example:
Consider an ordinary first order differential equation,
dψ 1+3x2 1+3x 2
dx
+ (x + 1+x+x3
)ψ = x3 + 2x + x2 1+x+x 3
w = [0.23;-0.45;0.41];
u = [-0.03;0.21;0.3];
v = [-0.434;0.45;0.372];
such that there is one input node , three hidden node and one output node.
After running the above algorithm for training network and solution network in MATLAB we get
the following result.
28
Table 1: Experimental Result.
Test points Estimated Solution Actual solution Error Error
xk ψt (xk ) ψa (xk ) ek ∆ψ(x)
0.1104 0.9238 0.9008 0.000045 0.0230
0.2390 0.8942 0.8111 0.000093 0.0830
0.1550 0.9078 0.8666 0.000042 0.0413
0.2503 0.8938 0.8046 0.000033 0.0892
0.2637 0.8939 0.7971 0.0000021 0.0968
0.0479 0.9594 0.9543 0.000096 0.0051
0.0186 0.9826 0.9818 0.000066 0.0008
0.2137 0.8962 0.8265 0.000048 0.0697
0.1905 0.8995 0.8416 0.000012 0.0579
0.0609 0.9505 0.9426 0.0000083 0.0079
0.1355 0.9140 0.8811 0.0000096 0.0329
0.0214 0.9801 0.9790 0.000070 0.0011
0.1381 0.9131 0.8791 0.0000055 0.0340
0.0565 0.9534 0.9465 0.000073 0.0069
0.1668 0.9046 0.8580 0.000014 0.0466
l = 1;
n = 1;
h = 3;
k = 1;
w = [0.23;-0.45;0.41];
u = [-0.03;0.21;0.3];
v = [-0.434;0.45;0.372];
A = 1;
eta = 0.0001;
epsilon = 10− 4;
iterations = 0;
e1 = 5;
Input layer
z = zeros(h,1);
net1 = zeros(h,1);
net2 = zeros(h,1);
for j = 1:h
29
z(j) = w(j)*x1 + u(j);
net1(j) = 1/(1 + exp(-z(j)));
net2(j) = w(j)*exp(-z(j))/(1 + exp(-z(j)));
end
Second hidden layer
N is the input to output layer in upper half to the network DN is the input to the output layer in
lower half of the network
N = 0;
DN is dN/dx
DN = 0;
for j = 1:h
N = N + v(j)*net1(j);
DN = DN + v(j)*net2(j);
end
si = A + x1*N;
Dsi = N + x1*DN;
e1 = abs(Dsi - myNeuralFun(x1,si));
if e1 < epsilon
break;
else
for j = 1:h
net1(j) = 1/(1 + exp(-z(j)));
Dnet1(j) = exp(-z(j))/(1 + exp(-z(j)));
end
DV = zeros(h,1);
DW = zeros(h,1);
DU = zeros(h,1);
for j = 1:h
D2net1(j) = Dnet1(j)*(1 - 2*net1(j));
DV(j) = 2*(net1(j) + x1*w(j)*Dnet1(j))*sqrt(e1);
DW(j) = 2*x1*v(j)*(2*Dnet1(j) + x1*w(j)*D2net1(j))*sqrt(e1);
DU(j) = 2*v(j)*(Dnet1(j) + x1*w(j)*D2net1(j))*sqrt(e1);
end
end
30
Parameter updation
v = v - eta*DV;
w = w - eta*DW;
u = u - eta*DU;
iterations = iterations + 1;
else
break;
end
end
Solution Network:To compute the solution computing process for the first order ODE
equation using neural parameters from a trained neural network
for i = 1:h
z(i) = w(i)*x1 + u(i);
net1(i) = 1/(1 + exp(-z(i)));
end
N1 = 0;
for j = 1:h
N1 = N1 + v(j)*net1(j);
end
Compute si = A + x1*N1 which is the required solution of given ODE
si = A + x1*N1;
disp(si);
e2 = abs(trueVal - si);
Function File:
NOTE:Above code will run upto 100000 iterations because if we don’t restrict the
number of iterations loop may become an infinite loop.So,it may be possible while
running above code that we get error greater than epsilon,in this case iterations will
be exactly 100000.So run the code again and note the result for number of iteration
less than 100000.
12 Conclusion
So far, we have learnt solving ordinary differential equations using Artificial Neural Network(ANN).The
solution of ODE obtained by ANN is a smooth curve (trial function) and can be differentiated
continously on smooth domain.This is in contrast with the discrete or non-smooth solutions ob-
tained by traditional schemes like finite element,finite difference etc.
The numerical solutions of ordinary differential equations and partial differential equations plays
a crucial role in engineering field.Some traditional methods like finite element ,finite difference,finite
31
volume etc based on discretizing the domain and weakly solving ODE over this discretization.However
these methods are adequate to get the solution of differential equations in engineering applica-
tions.But there is a limitation of these methods that they provide discrete solution and have
limited differentiability.So to avoid this problem there is a term called Artificial neural network
which we have explained above to solve differential equations and solution obtained will be a
smooth curve which can be differentiated continously on smooth domain of defination.
13 References
• http://nptel.ac.in/courses/117105084/
• http://slideplayer.com/slide/7484664/
• Neha Yadav , Anupam Yadav and Manoj Kumar ”An introduction to Neural Network Methods
for Differential Equations”.1st ed. Springer Dordrecht Heidelberg New York London,2015.
• Srimanta Pal ,”Numerical Methods: Principles, Analysis and Algorithms”,Oxford University Press
India,2009.
32