You are on page 1of 33

SUMMER INTERNSHIP REPORT

(15th May - 30th June , 2016)

ON

”NEURAL NETWORK AND ITS APPLICATIONS”

SUBMITTED BY:

”CHHAVI SHARMA”

M.Sc + Ph.D DUAL DEGREE IN ”INDUSTRIAL ENGINEERING AND OPERATIONS


RESEARCH”

INDIAN INSTITUTE OF TECHNOLOGY,BOMBAY

UNDER THE SUPERVISION OF:

”DR. MANOJ KUMAR”

ASSOCIATE PROFESSOR

DEPARTMENT OF MATHMATICS

MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY,ALLAHABAD


Abstract
Differential equations play vital role in various fields of engineering and science. The exact
solution of differential equations may not be always possible.So various types of well known numer-
ical methods such as Euler, Runge-kutta, Predictor-Corrector, finite element, and finite difference
methods, are used for solving these equations. Although these numerical methods provide good
approximations to the solution, but these may be challenging for higher dimension problems. In
recent years, many researchers tried to find new methods for solving differential equations.As such
here Artificial Neural Network (ANN) based models are used to solve ordinary differential equations
with initial conditions.

1
CONTENTS:
• Introduction

• Biological Neural Network

• Artificial Neural Network

• Mathmatical Model of Artificial Neural Network

• Activation functions

• Network Architecture

• Learning Algorithms and its different types

• Multi Layer Perceptron(MLP)

• Neural Network for Differential equations

• MLP for Ordinary Differential Equations

• Gradient Computaion using MLP

• Neural Networks for first order ODE

• MATLAB Code

• Conclusion

• References

2
1 Introduction
Network consists of nodes which are interconnected with each other to form a large circuit.In our brain
we have a biological neural network.The basic processing element of our brain is neuron.Around 10
billion neurons interconnected with each other to form a biological neural network.

In an analogy to the brain, an entity made up of interconnected neurons, neural networks are made
up of interconnected processing elements called units, which respond in parallel to a set of input signals
given to each. The unit is the equivalent of its brain counterpart, the neuron.
A neural network consists of four main parts:

1. Processing units uj , where each uj has a certain activation level aj (t) at any point in time.

2. Weighted interconnections between the various processing units which determine how the activa-
tion of one unit leads to input for another unit.

3. An activation rule which acts on the set of input signals at a unit to produce a new output signal,
or activation.

4. Optionally, a learning rule that specifies how to adjust the weights for a given input/output pair.

2 Biological Neural Network:


• Biological neural network is the network of a large number of biological neurons interconnected
with each other.

• A biological neuron has different components like dendrites,soma ,nucleus, axon and terminal
buttons.

• Dendrites:It accepts the input signals and fed these signals to the nucleus inside the soma.

• Axon:It accepts the input signals from the nucleus and transmits them to the other interconnected
neurons through terminal buttons.Axon behaves as the transmission line to transmits the signals
from one neuron to other neuron through terminal buttons on the axon.

2.1 Working of Biological Neuron:


Suppose we have three neurons N1 , N2 and N3 .These three neurons are interconnected with
each other such that the output of N1 neuron is fed as input for the N2 neuron through terminal
buttons.Similarly output of N2 neuron is fed as input to theN3 neuron.This process goes on as we
move on,if we have a large network of neurons.

Figure 1: Biological Neuron

3
3 Artificial Neural Network:
In 1943,Mculloch and Pitts gave a mathmatical model based on the activity of a biological neu-
ron.This model is known as artificial neuron and when the large number of artificial neurons
interconnected with each other they form an artificial neural network.The idea of this artificial
neural network was taken from the activity and structure of biological neural network.A biological
neural network may be modelled artificially to perform computation and then the model is termed
as artificial neuron. Hence, we can say that,

• Artificial Neural Networks are the programs designed to solve any problem by trying to mimic the
structure and the function of our nervous system.

• Neural Networks are based on simulated neurons, which are joined together in a variety of ways
to form networks.

• Neural Network resembles the human brain in the following two ways:
- A neural network acquires knowledge through learning.
- A neural network’s knowledge is stored within interconnection weights known as synaptic
weights.

Figure 2: An artificial neural network in which each node represents neuron and there is a hidden layer
between input layer and output layer.Arrow represents that the output of the input layer is sent as input
to the hidden layer and output of the hidden layer is sent as input to the output layer neurons.

• Artificial Neuron:It is the basic processing element of artificial neural network.A neuron accepts
inputs from one or more number of neurons and produces only one output.This output is trans-
mitted to the other neurons through synapses(a junction which is used to sent the signals from
one neuron to other neuron).This output is obtained when the weighted sum of signals multiplied
by the corresponding synaptic weights is activated by a function which is known as activation
function or threshold function.
The connection weights or synaptic weights indicates the strength of the signals.

Figure 3: Design of artificial neural network

4
4 Mathmatical form of Artificial Neural Network:

Figure 4: Detailed mathmatical structure

• We have a set of n input signals S = {xi : i = 1, 2, ......, n}


• There is a connection weight wji corresponding to jth neuron, wji is the connection weight corre-
sponding to input transmitting from ith neuron to jth neuron.
• There is a threshold value θ which has to be reached to get an output signal.
• There is also a bias term wj0 .
• There is a function f() called activation function which acts on the weighted sum of signals to get
the desired output.
Weighted sum = ni=1 wji xi + wj0 .
P

output from jth neuron is f( ni=1 wji xi + wj0 ) i.e


P

Pn
Oj = f( i=1 wji xi + wj0 ) s.t

Pn
i=1 wji xi + wj0 ≥ θ

Figure 5: Working principle of artificial neuron

5 Activation Functions:
Activation function of a neuron is a nonlinear mathmatical function.There are some examples
like sigmoid function,hardlimiter function,step function,tanh function,gaussian activation function
etc.The choosen activation function depends upon the type of problem we are solving.
• Sigmoid function:It is a s-shaped nonlinear mathmatical function also known as squashing or
logistic function.Mathmatically,

1
σ(x) = 1+e−x
, 0 < σ(x) < 1 and −∞ < x < ∞

5
Figure 6: Sigmoid function

• Hardlimiter function:It is a step function which is defined as:

h(x) = {0 ;if x ≤ 0
= 1; otherwise

Figure 7: Hardlimiter function

• Gaussian activation function: It is defined as

−(x−m)2
f(x) = √1 e 2(σ)2 , where m and σ are the peaks and width respectively.
2Πσ

Figure 8: Gaussian activation function

Different activation functions are used depend upon the type of problem we are considering.Out of
the above explained activation functions sigmoid function and hardlimiter function are most com-
monly used.Sigmoid function is popular because it is a non-linear,monotonic and bounded function
0
and has a simple derivative f (s) = kf(s)(1 - f(s)).Hardlimiter function is also monotonic,bounded
but linear.The lower and upper bounds on the choosen activation function depends on the user
and the area of applications.

6 Network architecture:
• It gives the structure of a network how different nodes are interconnected with each other.In case
of neural network ,it defines the structure of neural network and refers to the way how different
neurons are connected and it is the important part of network functioning and learning.
• There are some network architecture in literature called neural network topology like instar topol-
ogy ,outstar topology,group of instars etc.

• In instar topology the neuron of the present layer accepts the output of each neuron from the
previous layer.
• In outstar topology the neuron of the present layer accepts the output of a neuron from the previous
layer.

6
7 Learning:
• Learning is a process to acquire knowledge from the past observations and examples.

• All the knowledge of neural network is stored in synapses as the connection weights or synaptic
weights.There is a matrix between two consecutive neuron layer which is called weight matrix.Once
this knowledge is acquired through learning then presenting a pattern for input to the network
will produce the correct output.

• Therefore,learning is a process in which a neural network update its parameters and weights in
response to input so that the actual output converges to the desired output.Once we get the actual
output closest to the desired output ,our learning phase gets completed and we acquired knowledge.

Also, the weight adjustment scheme is known as learning law.


Learning rules are described by the mathmatical expressions called learning equations.There are
some standard learning algorithms which are described below:

1.Supervised Learning: In this process we compare the actual output with the target output
and if there is some error signal then a teaching signal is sent to the neural network to update the
connection weights.These teaching signals are also known as teaching sample.
2.Unsupervised Learning: This learning algorithm is unlike the supervised learning.In this
there is no teaching signal is fed to the network but some guidelines are sent to it to adjust the
parameters so that the error signal is minimized.
3.Reinforced Learning: This Learning requires one or more neurons at the ouput layer.In this
learning there is a teacher (training sample) unlike in supervised learning which indicates how the
actual output is closest to the target output.The error signal generated by the teacher is only a
binary like pass/fail,true/false,0/1,-1/1 etc.If the error signal generated by the teacher is fail,it
means there is a requirement to re-adjusts the weight parameters untill the result by the teacher
is pass.
4.Competitive Learning:This learning method requires one or more neurons at the output
layer.In this learning method different output layer neurons compete with each other to give the
output closest to the desired output.The network output corresponding to the applied input signal
becomes dominant one and the remaining outputs from the other neurons cease producing an
output signal for that input signal.

• In this learning rule , a neuron learns by shifting its weights from inactive connection to active ones.

• Only the winning neuron and its neighborhood are allowed to learn.

• If a neuron does not respond to a given input pattern then learning can not occur in that particular
neuron.
Mathmatically,the winning rule is defined as:
yk = { 1; if vk > vj for all j, j 6= k
0; otherwise

Pn
j=1 Wkj = 1, ∀ k

The neuron corresponding to which yk = 0 is looser and the neuron corresponding to which yk =
1 is the winner.

NOTE: There is another method to find the winning neuron which is described below:

• The overall effect of the competitive learning rule resides in moving the synaptic weight vector
Wkj of winning neuron k towards the input vector X.

• The matching criteria is equivalent to the minimum euclidean distance between vectors.

7
• The Euclidean distance between a pair of n by 1 vectors X and Wkj is defined by:

1
dk = k X − Wkj k = [ nj=1 (xj − wkj )] 2
P
where xj and wkj are the jth elements of vector X and vector Wkj and n be the number of input
neurons to the kth neuron.

• To identify the winning neuron, kX , that best matches the input vector X, we may apply the
following condition:
kX = mink k X − Wkj k ,k = 1,2,.....,m

Competitive Learning Rule:

The competitive learning rule defines the change ∆Wkj applied to the synaptic weight Wkj
∆Wkj = { η (xj − Wkj ) ; if neuron k wins
0 ; if neuron k looses

where η is the learning rate Wk = [Wk1 , Wk2 , ......, Wkn ]

x = [x1 , x2 , ......, xm ]

Moving Wk towards the input pattern X provides quick learning.

Example: Consider a single layer neural network.


We have two dimensional input vector X = [0.52,0.12] , it is fed into the output layer consists of
three neurons.So we have 6 connections;
1 −→ 1
1 −→ 2
1 −→ 3
2 −→ 1
2 −→ 2
2 −→ 3
with initial synaptic weight matrix is given by
 
0.27 0.42 0.43
0.81 0.70 0.21

Input to the output layer neuron 1 is v1 = x1 × W11 + x2 × W12


v1 = 0.52 × 0.27 + 0.12 × 0.81
v1 = 0.2376
v2 = x1 × W21 + x2 × W22

v2 = 0.52 × 0.42 + 0.12 × 0.70

v2 = 0.3024

v3 = x1 × W31 + x2 × W32

v3 = 0.52 × 0.43 + 0.12 × 0.21

v3 = 0.2488
Suppose the activation function is logistic function
y1 = f1 (v1 ) = 1+e−10.2376 = 0.559122

1
y2 = f2 (v2 ) = 1+e− 0.3024
= 0.57503

1
y3 = f3 (v3 ) = 1+e− 0.2488
= 0.56188

Now calculate the distance between the input vectors and corresponding weight vectors.

8
p
d1 = p(x1 − W11 )2 + (x2 − W12 )2
d1 = (0.52 − 0.27)2 + (0.12 − 0.81)2
d1 = 0.73389
p
Similarly d2 = (x1 − W21 )2 + (x2 − W22 )2
p
d2 = (0.52 − 0.42)2 + (0.12 − 0.70)2

d2 = 0.58856
p
d3 = (x1 − W31 )2 + (x2 − W32 )2
p
d3 = (0.52 − 0.43)2 + (0.12 − 0.21)2

d3 = 0.1140
So, out of d1 , d2 and d3 , d3 is minimum.So neurom 3 is the winner and by competitive learning
equation weight change corresponding to connection 1−→ 3 and 2 −→ 3 are given as;

∆Wkj = η(xj − Wkj )


∆W31 = 0.1(0.52 - 0.43)
∆W31 = 0.09
∆W32 = 0.1(0.12 - 0.21)
∆W32 = -0.01

and connection weight corresponding to other connections remain unchanged because weight
change ∆Wkj corresponding to looser neurons is zero. Now new synaptic weight matrix is given by;
 
0.27 0.42 0.43 + 0.09
0.81 0.70 0.21 − 0.01
 
0.27 0.42 0.52
0.81 0.70 0.20

5.Hebbian Learning:This learning method involves the updation of connection weights.There are
some charecterstics of Hebbian synapses:
(a) Time Dependent,it means that when we have two neurons one is pre-synaptic neuron and other is
post synaptic neuron then activation of these neurons occur simultaneously.

(b) Local,it means Hebbian synapses involves only pre-synaptic and post-synaptic neurons and
doesn’t care about the other neurons.

(c) Strongly interactive


-positive correlation: synaptic strengthening -negative correlation: synaptic weakning -uncorrelated

Classification of synaptic modification:

1.Hebbian: Synapse increases its strength with positive correlation.

2.Anti-Hebbian: Synapse increases its strength with negative correlation.

3.Non-Hebbian: It doesn’t involve Hebbian mechanisms of either kind.

8 Mathmatical Model of Hebbian Mechanisms:


Using Hebb’s law we can express the adjustment applied to the weight Wji at iteration n in the following
form: ∆Wkj (n) = F [yk (n), xj (n)]

9
Hebb’s hypothesis is given by,

∆Wkj (n) = ηyk (n)xj (n)

where η is known as learning rate parameter.

This is known as activity product rule.

By keeping η and xj constant,it is the equation of straight line passing through origin with slope ηxj

8.1 Forgetting factor:


• Hebbian learning implies that weights can only increase.

• To resolve this problem,we might impose a limit on the growth of synaptic weights.It can be done
by introducing a non-linear forgetting factor into Hebb’s Law:

∆Wkj (n) = ηyk (n)xj (n) - ϕyk (n)Wkj (n)

where ϕ is called forgetting factor

• Forgetting factor usually falls in the interval between 0 and 1 ,typically between 0.01 and 0.1,to
allow only a little ’forgetting’ while limiting the weight growth.

8.2 Hebbian Learning Algorithm:


Step 1:Initialisation.

• Set initial synaptic weights and thresholds to small random values,say in an interval [0,1].

Step 2:Activation.
Compute the neuron activation at iteration n

Pn
yk = f( j=1 Wkj xj + w0 ), where n is the number of input neurons

Step 3: Learning.

• Update the weights in the network:

Wkj (n + 1) = Wkj (n) + ∆Wkj (n)

where ∆Wji (n) is the weight correction at iteration n


The weight correction is obtained by the following equation,

10
∆Wkj (n) = ηyk (n)xj (n) - ϕyk (n)Wkj (n)

∆Wkj (n) = yk (n)(ηxj (n) − ϕWkj (n))

Step 4:Iteration.

Increase iteration n by one ,go back to step 2.

9 Multi Layer Perceptron:


Multi Layer Perceptron neural networks consists of many simple perceptron in a hierarchical structure
forming feed-forward topology with one or more hidden layers between input layer and output layer.
Mathmatically,it was proved by researchers that a neural network having single hidden layer is equiva-
lent to the neural network having more than one hidden layers.
MLP with single layer forms half plane decision regions , with two layers forms convex decision region
and with three or more forms very complex-decision region.
The learning algorithm used in MLP are delta rule and Backpropagation algorithm.

9.1 Backpropagation Algorithm:


A backpropagation algorithm of ANN was first proposed by Werbos in 1974.Later on Rumelhart ,Hin-
ton, and Williams exploitedbackpropagation in their work in simulating cognitive process.Since then
Backpropagation algorithm is used in solving many problems which even difficult to solve by computer
science techniques.

1. Backpropagation algorithm has three layers input layer ,hidden layer,and output layer.

2. There is no connection between neurons within the layer but two consecutive layers are fully
connected with each other.

3. There are two synaptic weight matrices one is corresponding to input layer and hidden layer and
other is corresponding to hidden layer and output layer.

4. Backpropagation is a mathmatical tool widely used as learning algorithm in feed forward multi-
layer neural network.

5. It is difficult to find the errors in hidden layer neurons ,there is no direct way to find the errors of
hidden layer neurons.

6. Before starting the backpropagation learning procedure , we should know the following things:

• The set of normalised training patterns (sample or data) , both inputs {Xk } and corresponding
target {Tk }.
• A value of learning rate η.
• A criteria for the termination of algorithm.
• A methodology for the updation of weights i.e we should have a rule that how we have to
update the connection weights.
• An activation function, usually non linear sigmoid activation function is preferred to get the
activation of signals.
• Initial synaptic weight matrix whose entries are just the random numbers between -0.5 and
0.5.

9.2 Learning Procedure:


1. Collect the set of training patterns (input,output) (Xk , Tk )

2. Apply Xk to the input layer of MLP Neural network.

3. Input to the input layer causes a response to the next layer i.e input to the present layer is the
output obtained from the previous layer neurons.

11
4. Calculate the outputs from all the layers between input layer and output layer.

5. Calculate the actual output of output layer neurons and compare it with the target output and
check whether there is a error signal or not.If there is some error then calculate the rate of change
of errors with respect to the connection weights in the forward direction of the network flow i.e
δe
δWij
, Wij = connection weight.

6. Now update the weights between the output layer and last hidden layer.

7. Again calculate the rate of change of errors with respect to the connection weights between the
last two hidden layers.Then update the weights between the last two hidden layers.

8. Repeat the above process untill we reach the input layer.

9. Repeat from step 2 to step 8 for all input data set i.e repeat the above steps untill the end of the
input data set.

10. Now Check the training status of the network by calculating total errors corresponding to all
training patterns (i.e input data set).If training status is not satisfactory then adjust the training
parameters like learning rate ,momentum factor etc.

11. If the status is not satisfactory by changing the training patterns then repeat the above whole
procedure with new initialisation weight matrix .

NOTE:In Backpropagation algorithm signals are transmitted in forward direction and


weight updation takes place in backward direction that’s why this learning procedure is
known as backpropagation.

9.3 Calculation of rate of change of errors with respect to connection


weights:
Consider some formulae:

let’s consider jth neuron of a layer of MLP neural network

ej (n) = tj (n) − yj (n)


(1)

where tj and yj are the target output and actual output of jth neuron respectively at nth iteration.

Error energy at nth iteration is given by;

1
e2j (n)
P
E(n) = 2 j∈C
(2)
Average error energy is given by

1
PN
Eav (n) = N
( n=1 En (n))
(3)

Pm
vj = i=0 Wji (n)yi (n)
(4)

yj (n) = φ(vj (n)

12
(5)
where φ() is the activation function.
Now rate of change of errors with respect to the connection weights Wji is given by δE(n)
δWji
.The weight
updation δWji is directly proportional to this rate of change of error i.e
δWji = -η δE(n)
δWji
, where η is the learning rate and minus sign is present because we are updating the
weights in backward direction.
δE(n) δE(n) δej (n) δyj (n) δvj (n)
δWji (n)
= . . .
δej (n) δyj (n) δvj (n) δWji (n)
where
δE(n)
δej (n)
= ej (n) ,
δej (n)
δyj (n)
= -1
δyj (n) 0
δvj (n)
= φ (vj (n))

δvj (n)
δWji (n)
= yi (n)

δE(n) 0
δWji
= -ej φ (vj (n))yi (n)

0
So, ∆Wji = -ηej φ (vj (n))yi (n)

0
let δj (n) = ej φ (vj (n))
δE(n) δE(n) δej (n) δyj (n)
δj (n) = - vj (n)
= . .
δej (n) δyj (n) δvj (n)

0
ej φ (vj (n))

(6)

δj (n) is known as local gradient

∆Wji = ηδj (n)yi (n)


(7)
η is known as learning rate.
If jth neuron belongs to the output layer then it is easy to calculate ∆Wji because for jth
0
neuron belongs to the output layer it is easy to calculate the terms ej , φ (vj (n)) and yi (n).
If jth neuron belongs to the hidden layer then is difficult to calculate the corresponding
error.So there is different procedure to calculate the weight updation formula for hidden
layer neurons.

Figure 9: Signal flow graph

Consider j as the hidden layer neuron and k as the output neuron

13
δE(n) δE(n) 0
δj (n) = - vj (n)
= - δyj (n)
φ (vj (n))
(8)

1
e2k (n)
P
E(n) = 2 k∈C

(9)
C is the set corresponding to output layer neurons whose cardinalty is equal to the number of neurons
in the output layer. P
δE(n) δE(n) δek (n)
δyj (n)
= δe .
k (n) δyj (n)
= k ek (n) δek (n)
δyj (n)

δE(n) P δek (n) δvk (n)


δyj (n)
= k ek (n) δvk (n) δyj (n)

(10)

ek (n) = tk (n) − yk (n)


(11)

δek (n) 0
δvk (n)
= -φ (vk (n))
(12)

Pm
For neuron k, vk (n) = j=0 Wkj (n)yj (n)

δvk (n)
yj (n)
= Wkj (n)

(13)

Equation 10 may be rewritten as :


δE(n) P 0
δyj (n)
=- k ek (n)φ (vk (n))Wkj (n)
0
δk (n) = ek (n)φ (vk (n))

Applying the above equation in equation (8),we get

0 P
δj (n) = φ (vj (n)). k δk (n)Wkj (n)
(14)

mL : Number of output neurons


Hence , equation (14) can be used to find out the gradient δj (n) if jth neuron belongs to the hidden
layer and by putting this gradient in equation (7) , we can find the weight change.

14
9.4 Practical considerations in Backpropagation algorithm:
0
(a). Since δj (n) depends on φj (vj ) so it should be differentiable.
Let the activation function φ() is logistic function then
1
yj (n) = φj (vj (n)) = 1+exp(−avj (n))
, a > 0 , ’a’ gives the slope of the function and −∞ < vj (n) < ∞
0 aexp(−avj (n))
φj (vj (n)) = [1+exp(−avj (n))]2
= ayj (n)[1 − yj (n)]

If neuron j is in output layer,

then, yj (n) = oj (n)


0
δj (n) = ej (n)φ (vj (n))
= ej (n)ayj (n)[1 − yj (n)]

δj (n) = a[tj (n) − yj (n)]yj (n)[1 − yj (n)]

(15)

If j to be the hidden layer neuron then;


0 P
δj (n) = φ (vj (n)). k δk (n)Wkj (n)

P
δj (n) = ayj (n)[1 − yj (n)] k δk (n)Wkj (n)

(16)

∆Wji (n) = α∆Wji (n − 1) + ηδj (n)yi (n)

(17)
where α is a positive number or momentum.It is given name momentum in the sense that whenever η
is small , learning will occur , so we are pushing or giving force to occur this phenomenon quickly.

η should be small according to stability considerations.

So,∆Wji (n) - α∆Wji (n − 1) = ηδj (n)yi (n)

Similarly, ∆Wji (n − 1) - α∆Wji (n − 2) = αηδj (n − 1)yi (n − 1)

Therefore, ∆Wji (n) = α2 ∆Wji (n − 2) + ηαδj (n − 1)yi (n − 1) + ηδj (n)yi (n)

Pn
∆Wji (n) = η t=0 αn−t δj (t)yi (t) because αn .∆Wji (0) = 0 because we are not changing the weights

15
at 0th iteration.

Length of time series = n + 1


Pn δv (t) δvj (t)
∆Wji (n) = -η t=0 αn−t δj (n) δE(t) j
vj (t) δWji (t)
where δj (t) = - δE(t)
vj (t)
and yi (t) = Wji (t)

Pn δE(t)
= -η t=0 αn−t W ji (t)

0 ≤| α |< 1 because if α > 1 then this time sereis doesn’t have any limit, all terms will be very large
which we don’t want to acheive.So, ≤| α |< 1
δE(t)
1. If Wji (t)
has the same sign for all t, | ∆Wji | grows in magnitude . so , it is accelerated descent.
δE(t)
2. If Wji (t)
alternates its sign in every iteration then | ∆Wji | is small in magnitude.

10 MLP:Multi Layer Perceptron


Input and Output are described below:

Let ni = Number of neurons in input layer


nh = Number of neurons in hidden layer
no = Number of neurons in output layer
h = A vector (h1 , h2 , ........, hnh )
i = A vector (i1 , i2 , ........, ini )) of input layer
o = A vector (o1 , o2 , ........, oni )) of output layer
wih = A weight matrix [wih ]ni ×nh for input to hidden layer

who = A weight matrix [who ]nh ×no for hidden to output layer
F(x) = 1+e1−x , an activation function

In the forward pass,execute the following steps(from step 1 to step 2):

Step 1: Compute the hidden layer neuron activation


P i as follows:
h = F(iwih ) where hk = F(ak ) = 1+e1−ak and ak = nl=1 wkl il for k = 1,2,........,nh

Step 2: Compute the output - layer neuron activation as follows:

1
Pnh
o = F(hwho ) , where ok = F(bk ) = 1+e−bk
and ok = l=1 wkl hl for k = 1,2,........,no

In the backward pass execute the following steps:

Step 3: Compute the output layer error from equation (15) δj (n) = a[tj (n) − yj (n)]yj (n)[1 − yj (n)]

Since δj (n) is directly proportional to the error [tj (n) − yj (n)] so we will try to minimize δj (n) , so
let

dk = a[tj (n) − yj (n)]yj (n)[1 − yj (n)] where dk = error in output layer neuron k , and tk = target
output of the kth neuron in the output layer.

Step 4:
Compute the hidden layer output from the equation (16),
Pno ho
ek = hk (1 − hk ) i=1 wik di for k = 1,2,.......,nh

where ek = error in hidden layer neuron k.

16
e = (e1 , e2 , ......., enh )

Step 5: Now update the weights of the connections between hidden and output layer by the formula:
Wji (n + 1) = Wji (n) + ∆Wji (n)

where ∆Wji (n) is given by the equation (17) ∆Wji (n) = α∆Wji (n − 1) + ηδj (n)yi (n) and δj (n) is
calculated from equation (15).

Step 6: Now update the weights of connections between input layer and hidden layer by the same
expression as above except the value of δj (n) is calculated from equation (16).

Repeat from step 1 to step 6 for all input pairs

10.1 Numerical Computation of an MLP:


Let’s consider a neural network having three layers input , hidden , and output layer.
Input and hidden layer has two neurons and output layer consists of a single neuron.

ni = 2,
nh = 2,
no = 1,
η = 0.6,

Initial weight matrices corresponding to input-hidden layer and hidden-output layer are given by

wih (0) =

 ih ih

w11 (0) w21 (0)
ih ih
w12 (0) w22 (0)
 
0.1 −0.3
0.3 0.4
and who (0) =  
ho
w11 (0)
ho
w12 (0)
 
0.4
0.5
Note the initial wih (0) and wih (0) are randomly choosen between -0.5 and 0.5.
Step 1: Calculate hidden layer activation as follows:

a1 (0) = i1 w11 (0) + i2 w12 (0) = 0.2×0.1 + 0.6×0.3 = 0.2

a2 (0) = i1 w21 (0) + i2 w22 (0) = 0.2×(-0.3) + 0.6×0.4 = 0.18

1
Now h1 (0) = F(a1 ) = 1+e−0.2
= 0.5498

1
h2 (0) = F(a2 ) = 1+e−0.18
= 0.54488

h(0) = (h1 (0), h2 (0)) = (0.5498,0.54488)

Step 2: Calculate output layer activation

b1 (0) = h1 w11 (0) + h2 w12 (0) = 0.5498×0.4 + 0.54488×0.5 = 0.49236

1
o1 (0) = F(b1 ) = 1+e−0.49236
= 0.6206

o(0) = (o1 (0)) = (0.6206)

Step 3: Calculate the error corresponding to output layer ;

17
d1 = o1 (0)(1 − o1 (0))(t1 − o1 ) = 0.6206(1 - 0.6206)(0.7 - 0.6206) = 0.0187

d(0) = (d1 (0)) = (0.0187)

Step 4: Calculate the error corresponding to hidden layer ;


e1 (0) = h1 (0)(1 − h1 (0))d1 (0)w11 (0) = 0.5498(1 - 0.5498)×0.0187 × 0.4 = 0.00185

e2 (0) = h2 (0)(1 − h2 (0))d1 (0)w12 (0) = 0.54488(1 - 0.54488)×0.0187 × 0.5 = 0.00232

e(0) = (e1 (0), e2 (0)) = (0.00185,0.00232)

Step 5: ∆wji = ηδj (n)yi (n) , where δj (n) = d(0) for output layer neuron j

ho
Therefore , ∆w11 = 0.9 × 0.0187 × 0.5498 = 0.0092531

ho
∆w12 = 0.9 × 0.0187 × 0.54488 = 0.0091703

ho ho ho
Therefore , w11 (1) = w11 (0) + ∆w11 (0) = 0.4 + 0.0092531 = 0.4092531

ho ho ho
w12 (1) = w12 (0) + ∆w12 (0) = 0.5 + 0.0091703 = 0.50917

Therefore, new connection weight matrix W ho (1) is given by:


 
0.4092531
0.50917
Step 6: update weight updation corresponding to input layer and hidden layer

∆wji (0) = ηδj (0)yi (0) , where δj (0) = ej (0) for hidden layer neuron j

ih
∆w11 (0) = 0.9 × 0.00185 × 0.2 = 0.00033

ih
∆w12 (0) = 0.9 × 0.00185 × 0.6 = 0.001

ih
∆w21 (0) = 0.9 × 0.00232 × 0.2 = 0.00042

ih
∆w22 (0) = 0.9 × 0.00232 × 0.6 = 0.00125

Therefore, new connection weight matrix is given by;


 
0.1 + 0.00033 −0.3 + 0.00042
0.3 + 0.001 0.4 + 0.00125

i.e W ih (1) =

 
0.10033 −0.29958
0.301 0.40125
So, we have completed one iteration , now repeat the whole procedure i.e repeat from step 1 to step
6 by considering the new connection weight matrices.As the iteration increases the error between the
computed output and the target reduces i.e , o(j) −→ t as j −→ ∞.

11 Neural Network For Differential Equations:


In this section we will describe a neural network based model to solve ordinary differential equa-
tions.Before we go further let’s know why this neural network based model is important to solve ordinary
differential equations.

11.1 Advantages of Neural Network for solving Differential Equations:


• The solution of given differential equation obtained by neural network method is differentiable and
hence continous which implies solution obtained is a smooth curve and is in closed analytical form

18
that can be used in any subsequent calculation.While the solution obtained by other techniques
like Runge Kutta gives discrete solution or a solution of limited differentiability.

• The neural network based method to solve differential equations provide solution with good gen-
eralisation.

• Complexity to solve ODE’s using neural network method does not increase quickly as the number
of sample points increase while in numerical methods complexity increases very rapidly as the
number of sampling points increase in the interval.

• This method is general and can be applied in orthogonal box boundaries or a body having irregular
boundries.

• This method tackles difficult differential equations arising in many engineering problems which are
even difficult to solve using numerical methods.Neural Network tackles many difficult problems
arising in real life.

Neural network can solve both ordinary and partial differential equations.This method to solve ODE’s
employs a feedforward neural network as a basic element which is known as error approximation function
and it is to be minimized by updating the parameters (i.e training the neural network).
We consider a trial solution of the given differential equation which is the sum of the two terms one is
independent of parameters and other is dependent on the parameters.This parameter dependent term
employs feed forward neural network in which network is trained to get the approximated solution of
given differential equation.We will study all these things later in detail.

11.2 MLP for Ordinary Differential equation:


An ordinary differential equation can be solved using MLP neural network.
let’s consider the given differential equation is given by

G(→

x , ψ(→

x ), ∇ψ(→

x ), ∇2 ψ(→

x )) = 0, →

x ∈D

(18)
is the general form of ordinary differential equation where →−
x = (x1 , x2 , ....., xn ) ∈ Rn and D ⊂ Rn is
the input vector , D is the domain of defination , ψ is the dependent variable and
Pn ∂
∇= i=1 ∂ −

xi

∂2
Pn
∇2 = i=1 ∂x2i ,
Assumption:Domain of defination is orthogonal box.It means that no product of two xi ’s appear to-
gether.

Objective: Compute the solution ψ of above ODE (18)

11.2.1 Transformation
In this section we will transform above differential equation (18) into another one whose domain of
defination is a discrete set.

So,discretize the domain of defination D and its boundary S into discrete sets D
b and S.So
b equation
(18) can be transformed into

G(→

xt , ψ(→

xt ), ∇ψ(→

xt ), ∇2 ψ(→

xt )) = 0, ∀→

xt ∈ D
b
(19)
subject to some constraints which are just the initial conditions given to us in question.

Here constraints means the restrictions which are imposed on the solution of given differential equa-
tion i.e we have to find such solution of given ODE such that it satisfies some initial conditions known

19
as constraints.

Let ψt (→

x ,→

p ) be a trial solution of transformed equation (19) where →

p is the adjustable parame-
ter which gives a feed forward neural network in which the parameters →

p are adjusted such that the term

G(→

xt , ψ(→

xt ), ∇ψ(→

xt ), ∇2 ψ(→

xt )) = 0, is minimum.

Mathmatically, it can be represented as;

min−

P


− →
− →
− →

→ (G( xt , ψ( xt ), ∇ψ( xt ), ∇2 ψ( xt )))2
p xi ∈ D

(20)

Above trial solution is a function of parameter →



p which is to be adjusted to minimize above term
and hence this trial solution employs feed-forward neural network and →−p corresponds to the weight
parameters and biases of the network architecture.

11.2.2 Trial function:


We can construct the above trial function by following these two steps:

• The trial function must satisfy the boundary conditions.

• Trial function is the sum of two terms one is independent of the parameters and other is with
adjustable parameters.

Suppose the trial function is:

ψt (→

x ) = A(→

x ) + F (→

x , N (→

x ,→

p ))

(21)

where,

ψt (→

x ) = Trial function

A(→
−x ) = Term which is independent of network parameters i.e this term does not contain any
adjustable parameter and satisfies boundary conditions.

F (→

x , N (→

x ,→

p )) = Term with adjustable parameter →

p and does not contribute to the boundary
conditions.

N (→

x ,→
−p ) = output of feed-forward neural network with parameters →

p and fed with n dimension
input vector →
−x.


x = input vector of dimension n.

11.2.3 Unconstrained Optimization Problem:

• Optimization problem is the problem which contains objective function subject to some constraints
known as restrictions on the participating variables.

• Our goal is to find such values of independent variables which optimizes the objective function
(maximize or minimize depend on the requirement in the question) subject to satisfy the given
constraints.

20
• Trial function defined in equation (21) must satisfy the initial conditions by construction.Other
term F (→
−x , N (→

x ,→

p )) employs a feed-forward neural network in which various weight parameters
and biases of the neural network are to be adjusted to minimze the error defined in equation (20).

• So, originally the problem defined in equation (18) and (19) was constrained optimization problem
as the problem is defined subject to some initial conditions known as constraints.

• But here the problem is reduced to equation (20) which contains trial function which already
satisfies the initial conditions by such construction of trial function.So, it is known as unconstrained
optimization problem due to our choice of such trial function which satisfeis initial conditions.

• Above unconstrained model can be defined as follows:

• Minimization of equation (20) can be seen as the training process of neural network.

• The training error is G(→



x ) or G(→

x ,→

p ).

• Training process involves the updation of network parameters →



p such that the term defined in
equation (20) is minimum.

• To calculate the training error G(→



x ) or G(→

x ,→

p ) we need to calculate the network output, deriva-
tives of network output with respect to input variables.

• The network parameter updation formula involves the gradient of network error with respect to
weight parameters.So we have to calculate all these things to get the error and weight updation
formula.

11.3 Gradient Computation Using MLP:


In this section we will calculate different orders of gradient of network output with respect to input
variables using MLP.

11.3.1 Gradient Computation with respect to Network inputs:

Consider a Multi Layer Perceptron neural network having the following parameters:

n = number of neurons in the input layer


h = number of neurons in the hidden layer (single)
l = number of neurons in the output layer l = 1 (say)

Here we will work on to calculate gradient with respect to input variables for single hidden layer.The
extension to the case of more than one hidden layer can be obtained accordingly.

vi = weight corresponding to the connection from hidden layer neuron i to the output layer,

wij = weight corresponding to the connection from jth input node to the ith hidden node,

ui = bias of the ith hidden node,

σ(z) = sigmoid activation function also known as transfer function,

Pn
where zi = j=1 wij xj+ui is the input to ith hidden node.

21
For a given input vector →

x = (x1 , x2 , ......., xn ), network output is given by

Ph 1
N= i=1 vi σi (zi ) , where σ(zi ) = 1+exp(−zi )

Now derivative of network output N with respect to xj is given by ,

∂N
Ph
∂xj
= i=1 vi wij σ (1) (zi )

For higher order derivative say k,

∂k N
Ph k (k)
∂xkj
= i=1 vi wij σ (zi )
(22)

where σ k (zi ) denotes the kth derivative of σ(zi )

More,generally we can write as;

∂ m1 +m2 +.......+mn N
Ph Qn mk
m m
∂x1 1 x2 2 ......xm n = i=1 vi ( k=1 wik )σ (m1 +m2 +.......+mn ) (zi )
n

(23)

So, network derivatives with respect to input parameters of any order can be calculated from
above equation (22).Weight parameters wij , vi , biases ui and sigmoid activation function σ(zi ) are
known to us.

• Network architecture for the computation of derivatives is same as the architecture of original
MLP neural network except the parameters and activation function are different.

• Activation function for MLP neural network for the computation of derivatives is just the derivative
of activation function (derivative of sigmoid function).

• And parameters for new MLP neural network (for derivative computaion) can be calculated from
the parameters of the old MLP (i.e MLP for function).Of course this is true only when the training
of MLP is over and for fixed network architecture.

Fixed network architecture means if the architecture of MLP for the function is n × h × 1 then
the architecture of MLP for the derivative of function with respect to any variables and order is
also n × h × 1.

11.3.2 Gradient Computaion with respect to Network Parameters:

Parameters of new MLP (for derivatives computation) can be calculated as follows using old MLP.

∂ m1 +m2 +.......+mn N
Ph Qn mk
Suppose network derivative is Ng = m m
∂x1 1 x2 2 ......xm n = i=1 vi ( k=1 wik )σ (m1 +m2 +.......+mn ) (zi ).
n

Now the gradient of Ng with respect to parameters vi , wij and ui is given as;

∂Ng Qn mk
∂vi
=( k=1 wik )σ (m1 +m2 +......+mn )

∂Ng Qn mk
∂ui
= vi ( k=1 wik )σ (m1 +m2 +......+mn +1)

∂Ng m −1 Q
= xj vi ( nk=1 wik
mk
)σ (m1 +m2 +......+mn +1) + vi mj wij j ( nk=1,k6=j wik
mk
)σ (m1 +m2 +......+mn )
Q
∂wij

22
Network Parameter Updation:

When we get the network derivatives with respect to network parameters then the network upda-
tion formula can be written as:

vi (t + 1) = vi (t) + α ∂N
∂vi
g
,

ui (t + 1) = ui (t) + β ∂N
∂ui
g
,

∂Ng
wij (t + 1) = wij (t) + γ ∂wij

where α, β and γ are learning rates , i = 1,2,.....,n and j = 1,2,......,h

11.4 First-Order ODE:


In this section we,will describe MLP neural network for first order ODE.

Consider first order ODE as follows:

dψ(x)
dx
= f (x, ψ)
(24)
with x∈ [0,1] and the initial condition is ψ(0) = A.

Suppose the trial solution is given by

ψt (x) = A + xN(x,→

p ),
(25)

where N (x, →

p ) gives the output of feed-forward neural network with x as input and weight pa-


rameter p and

N (x, →
− Ph
p)= i=1 vi σ(zi ) , where σ(zi ) = wij xj + ui

zi = input to ith hidden node

σ(zi ) = output of ith hidden node

vi = weight from ith hidden node to output layer neuron.

Since ψt (x) is the solution of above ODE (24) so , it should satisfy the initial conditions by con-
struction.

On , differentiating eq. (25) we get,



dψt (x)
dx
= N(x,→

p ) + x dN (x,
dx
p)

(26)

23
Here x is not a vector because this is single dimension problem.So from equation (22) value of
dN (x,−

p)
dx
is given by ,

dN (x,−

p) Ph (1)
dx
= i=1 vi wi σi (zi ) (k = 1 and there is no j-suffix because x has only one dimension)

put this value in equation (26)

Therefore,

Ph
dψt (x)
vi σi (zi ) + x hi=1 vi wi σ (1) (zi )
P
dx
= i=1
(27)

dψt (x)
Now from equation (24) dx
= f (x, ψt (x))

dψt (x)
dx
- f (x, ψt (x)) = 0,

Now,we have to choose the network parameters for the set of input points x1 , x2 , ......, xn
where xi ∈ [0,1] ∀ i = 1,2,.....,n such that the term dψdx
t (x)
- f (x, ψt (x)) is equal to zero.

Generally, our main objective is given below:

min− →
− Pn dψt (x)
|x=xj - f (xj , ψt (xj )))2
p E( p ) =

j=1 ( dx

So, here we have one objective function and there is no constraint because the trial function ψt (x)
automatically satisfies the initial conditions by construction.So we have optimization problem
without constraint which is known as unconstrained optimization problem.Let’s call the term E(→ − p)
as error of MLP feed-forward neural network and we want to choose such type of parameters which
gives the minimum value of error E(→ −p ).
dψt (x)
Put the value of dx
form eq. (27) and the value of ψt (x) from eq. (25) in above minimization
problem


− Pn Ph Ph (1) Ph 2
min−
p E( p ) = j=1 ( i=1 vi σi (zi ) + xj i=1 vi wi σi (zi ) − f (xj , A + xj ( i=1 vi σi (zi ))))

1
where σ(zi ) = 1+e−zi

e−zi
σ (1) (zi ) = (1+e−zi )2
= σi (zi )(1 − σi (zi ))

−e−zi (1−e−zi )
σ (2) (zi ) = (1+e−zi )3

Thus,the error quantity is to be minimized as

E(→
− 2
p ) = k ( dψtdx(xk ) − f (xk , ψt (xk )))
P

2
E(→
− (1)
p ) = k ( hi=1 vi σi (zi ) + xk hi=1 vi wi σi (zi ) − f (xk , ψt (xk ))) .
P P P

(28)

where x0k s are points ∈ [0,1]

24
Minimization of E(→−p ) gives the optimized values of weight parameters v,w and bias u.Optimized
values of weight parameters v,w and bias u gives the error minimum.

Let’s us consider one term of E(→



p ) as ek corresponding to xk input.

Ph Ph (1) 2
ek = ( i=1 vi σi (zi ) + xk i=1 vi wi σi (zi ) − f (xk , ψt (xk ))) .

(29)

We are interested in minimizing the term ek rather than E(→ −


p ).We know that parameter updation
formulae depends on the gradient of error with respect to parameters.So,we must calculate the
derivatives of error with respect to network parameters v , w and u.

∂ek (1) √
∂vi
= 2{σi + xk wi σi } ek

∂ek (1) (2) √


∂wi
= 2xk vi {2σi + xk wi σi } ek

∂ek √
∂ui
= 2vi {σi1 + xk wi σi2 } ek

for k = 1,2,......,n

√ Ph Ph (1)
where ek = { i=1 vi σi (zi ) + xk i=1 vi wi σi (zi ) − f (xk , ψt (xk ))}
So, now we have calculated the derivatives of error with respect to network parameters.Now we
can follow any minimization technique like bakpropagation algorithm ,delta rule or quasi Newton
BFGS (Broyden,Fletcher,Goldfarb, and Shanno) etc to reduce error.Here we employed the sim-
plest but quite effective minimization technique called Delta rule which gives

∆vi = -η ∂e
∂vi
k
,

∂ek
∆wi = -η ∂w i
,

∆ui = -η ∂e
∂ui
k
,

for i = 1,2,.......,h

where η is the learning parameter

The adjusted weights are given by

vi (t + 1) = vi (t) + ∆vi ,

wi (t + 1) = wi (t) + ∆wi ,

ui (t + 1) = ui (t) + ∆ui .

Thus we have an iterative process which will continue till the derivatives of error with respect to
network parameters and the error function E(→ −
p ) comes down to its minimum value.
The initial values taken by weight parameters and bias i.e vi (0), wi (0), ui (0) are randomly choosen
between -0.5 and 0.5.

25
11.5 Neural Networks for First-Order ODE:
Two types of neural networks are used to solve a first order ordinary differential equation.One of
them is training network which is used to train the network to find the optimized values of network
parameters.Once the network parameters are frozen then the training network is used to obtain
the solution of first order ODE.

11.5.1 Training Network:

Training Network consists of five layers one input layer , three hidden layer and one output layer.It
basically comprises of two MLP which are combined in parallel between the input part of second
hidden layer.The activation function of upper half of the network is just the sigmoid function and
activation function of lower half of the network is the derivative of sigmoid function.

The working principle of training network is described in the algorithm given below.

Algorithm for Training of Neural Network:

ODE: Training process for first order ordinary differential equation

Input: Randomly selected number between 0 and 1 ,  an error limit. is that error limit that we
want to acheive such that our calculated error should be less than this error limit i.e we will repeat
our algorithm untill we get error ek < .

Output:Neural Network parameters v1 , v2 , ........, vh , w1 , w2 , ........, wh and u1 , u2 , ........., uh .

Step 1:Weight Initialisation

Choose initial values of weight parameters wi ,vi and ui between -0.5 and 0.5 for i = 1,2,....,h

Step 2: Set constants A,η and 


Step 3: Input Layer

Get a random input xk between 0 and 1.

Step 4: First Hidden Layer

Compute zj = xk wj + uj for j = 1,2,...,h.

−zj
1 e
and compute net1j = 1+e−zj
,net2j = wj 1+e−zj for lower and upper half of the network respectively

, for j = 1,2,.....,h

Step 5: Second Hidden Layer

Ph dN
Ph
Compute N = j=1 vj net1j , dx
= j=1 vj net2j

Step 6: Third Hidden Layer


Compute ψ = A + xk N , dx
= N + xk dN
dx


Step 7: Output Layer Compute ek = | dx
− f (xk , ψ) |
Step 8:Preparation

26
zi = wi xk + ui

1
σi = 1+e−zi

e−zi
σi1 = (1+e−zi )2

√ (1)
ek = { hi=1 vi σi (zi ) + xk hi=1 vi wi σi (zi ) − f (xk , ψ(xk ))}
P P

∂ek (1) √
∂vi
= 2{σi + xk wi σi } ek

∂ek (1) (2) √


∂wi
= 2xk vi {2σi + xk wi σi } ek

∂ek (1) (2) √


∂ui
= 2vi {σi + xk wi σi } ek

Step 9: Updation

Update network parameters for i = 1,2,......,h

vi (t + 1) = vi (t) − η ∂e
∂vi
k

∂ek
wi (t + 1) = wi (t) − η ∂wi

ui (t + 1) = ui (t) − η ∂e
∂ui
k

Step 10: Loop

Repeat from step 3 untill we get ek < .

Termination: Stop

11.5.2 Solution Network:

It consists of four layers one input layer, two hidden layers and one output layer.The activation
function of the neuron in MLP is just the sigmoid function.

It is the solution computing process of first order ordinary differential equation by using the net-
work parameters from a trained neural network.
The working principle of solution network is given below:

Algorithm
Input: Get a randomly generated x ∈ [0,1] and  is an error limit.

Output: Solution ψ of given first order ordinary differential equation.


Step 1: Netwrok setup Set up the network with trained network parameters vi , wi , ui for i =
1,2,......,h and the initial condition of the neural network is A.

Step 2: Input Layer Get an input xk in the normalised domain.

step 3: First hidden layer

27
for j = 1:h
zj = wj xk + uj ;
net1j = 1+e1−zj ;
end

Ph
Step 4: Second Hidden Layer Compute N = j=1 vj net1j

Step 5: Third hidden Layer Compute ψ = A + xk N

Step 6: Print ψ
Step 7: Termination Stop

11.6 Example:
Consider an ordinary first order differential equation,

dψ 1+3x2 1+3x 2
dx
+ (x + 1+x+x3
)ψ = x3 + 2x + x2 1+x+x 3

with ψ(0) = 1 and x ∈ [0,1].


−x2
e 2
Analytic solution is ψ(x) = 1+x+x3
+ x2

According to equation (25) , the trial solution is ψ(x) = 1 + xN (x, →



p ) where A = 1.
The MLP neural network is trained by using the above algorithm with initial parametrs are given
below:

w = [0.23;-0.45;0.41];

u = [-0.03;0.21;0.3];

v = [-0.434;0.45;0.372];
such that there is one input node , three hidden node and one output node.

After running the above algorithm for training network and solution network in MATLAB we get
the following result.

The error ∆ψ(x) is acheiving its minimum value upto 10−4 .

28
Table 1: Experimental Result.
Test points Estimated Solution Actual solution Error Error
xk ψt (xk ) ψa (xk ) ek ∆ψ(x)
0.1104 0.9238 0.9008 0.000045 0.0230
0.2390 0.8942 0.8111 0.000093 0.0830
0.1550 0.9078 0.8666 0.000042 0.0413
0.2503 0.8938 0.8046 0.000033 0.0892
0.2637 0.8939 0.7971 0.0000021 0.0968
0.0479 0.9594 0.9543 0.000096 0.0051
0.0186 0.9826 0.9818 0.000066 0.0008
0.2137 0.8962 0.8265 0.000048 0.0697
0.1905 0.8995 0.8416 0.000012 0.0579
0.0609 0.9505 0.9426 0.0000083 0.0079
0.1355 0.9140 0.8811 0.0000096 0.0329
0.0214 0.9801 0.9790 0.000070 0.0011
0.1381 0.9131 0.8791 0.0000055 0.0340
0.0565 0.9534 0.9465 0.000073 0.0069
0.1668 0.9046 0.8580 0.000014 0.0466

11.7 MATLAB Code:


Weight Initialization
l = number of output neurons
n = number of input pairs
h = number of hidden layer neurons

l = 1;
n = 1;
h = 3;
k = 1;

w is the weight vector corresponding to input layer and hidden layer


v is the weight vector corresponding to hidden layer and output layer
u is the bias vector of hidden units

w = [0.23;-0.45;0.41];
u = [-0.03;0.21;0.3];
v = [-0.434;0.45;0.372];

Read A , eta , and epsilon


Initial conditions

A = 1;
eta = 0.0001;
epsilon = 10− 4;
iterations = 0;
e1 = 5;
Input layer

while e1 > epsilon


x1 = rand(1,1);
First Hidden layer of upper half network
net2(j) is the output of jth hidden layer neuron in lower half of the network and net2(j) is just the
derivative of sigmoid function
if iterations < 100000

z = zeros(h,1);
net1 = zeros(h,1);
net2 = zeros(h,1);
for j = 1:h

29
z(j) = w(j)*x1 + u(j);
net1(j) = 1/(1 + exp(-z(j)));
net2(j) = w(j)*exp(-z(j))/(1 + exp(-z(j)));
end
Second hidden layer
N is the input to output layer in upper half to the network DN is the input to the output layer in
lower half of the network
N = 0;

DN is dN/dx

DN = 0;
for j = 1:h
N = N + v(j)*net1(j);
DN = DN + v(j)*net2(j);
end

Third hidden layer:


si is the trial solution of given differential equation

si = A + x1*N;

Dsi is the derivative of si

Dsi = N + x1*DN;

Calculate the error at output layer

e1 = abs(Dsi - myNeuralFun(x1,si));

if e1 < epsilon
break;
else

Preparation for weight updation


General formula for error e1

Dnet1 is the derivative of sigmoid function and is given by

for j = 1:h
net1(j) = 1/(1 + exp(-z(j)));
Dnet1(j) = exp(-z(j))/(1 + exp(-z(j)));
end

let partial derivative of error with respect to weights v , w , u are denoted by DV ,


DW , DU

DV = zeros(h,1);
DW = zeros(h,1);
DU = zeros(h,1);

for j = 1:h
D2net1(j) = Dnet1(j)*(1 - 2*net1(j));
DV(j) = 2*(net1(j) + x1*w(j)*Dnet1(j))*sqrt(e1);
DW(j) = 2*x1*v(j)*(2*Dnet1(j) + x1*w(j)*D2net1(j))*sqrt(e1);
DU(j) = 2*v(j)*(Dnet1(j) + x1*w(j)*D2net1(j))*sqrt(e1);
end
end

30
Parameter updation

v = v - eta*DV;
w = w - eta*DW;
u = u - eta*DU;
iterations = iterations + 1;

else
break;
end

end

Solution Network:To compute the solution computing process for the first order ODE
equation using neural parameters from a trained neural network

Input is randomly generated number between 0 and 1


Output is the solution si of ODE
disp(v);
disp(w);
disp(u);
disp(x1);
trueVal = exp(-x12 )/(1 + x1 + x13 ) + x12 ;

for i = 1:h
z(i) = w(i)*x1 + u(i);
net1(i) = 1/(1 + exp(-z(i)));
end

N1 = 0;
for j = 1:h
N1 = N1 + v(j)*net1(j);
end
Compute si = A + x1*N1 which is the required solution of given ODE

si = A + x1*N1;
disp(si);
e2 = abs(trueVal - si);

Function File:

function fVal = myNeuralFun(x,si)


fVal = x3 + 2 ∗ x + x2 ∗ ((1 + 3 ∗ x2 )/(1 + x + x3 )) − (x + (1 + 3 ∗ x2 )/(1 + x + x3 )) ∗ si;
end

NOTE:Above code will run upto 100000 iterations because if we don’t restrict the
number of iterations loop may become an infinite loop.So,it may be possible while
running above code that we get error greater than epsilon,in this case iterations will
be exactly 100000.So run the code again and note the result for number of iteration
less than 100000.

12 Conclusion
So far, we have learnt solving ordinary differential equations using Artificial Neural Network(ANN).The
solution of ODE obtained by ANN is a smooth curve (trial function) and can be differentiated
continously on smooth domain.This is in contrast with the discrete or non-smooth solutions ob-
tained by traditional schemes like finite element,finite difference etc.
The numerical solutions of ordinary differential equations and partial differential equations plays
a crucial role in engineering field.Some traditional methods like finite element ,finite difference,finite

31
volume etc based on discretizing the domain and weakly solving ODE over this discretization.However
these methods are adequate to get the solution of differential equations in engineering applica-
tions.But there is a limitation of these methods that they provide discrete solution and have
limited differentiability.So to avoid this problem there is a term called Artificial neural network
which we have explained above to solve differential equations and solution obtained will be a
smooth curve which can be differentiated continously on smooth domain of defination.

13 References
• http://nptel.ac.in/courses/117105084/

• http://slideplayer.com/slide/7484664/

• Neha Yadav , Anupam Yadav and Manoj Kumar ”An introduction to Neural Network Methods
for Differential Equations”.1st ed. Springer Dordrecht Heidelberg New York London,2015.

• Srimanta Pal ,”Numerical Methods: Principles, Analysis and Algorithms”,Oxford University Press
India,2009.

32

You might also like