Definitions of Certain Key Terms

Neuron: The basic nerve cell or computing unit for
biologic information processing.
Action potential: The pulse of electric potential generated
across the membrane of a neuron following the application
of a stimulus greater than the threshold value.
Axon: The output node of a neuron that carries the action
potential to other neurons in the network.
Axon hill cock: The starting point of the axon.
Dendrite: The input part of the neuron that carries a
temporal summation of action potential to soma.
Soma: The cell body of the neuron (that processes the
inputs from dendrites).
Somatic gain: The parameter that changes the slope of the
non-linear activation function, used in the architecture of
neuron.
Synapse: The junction point between the axon of a pre-
synaptic neuron and the dendrite of a post-synaptic neuron.
t is the axon-dendrite contact organ.
Synaptic and somatic learning: !ynaptic learning is the
component of the learning that determines the optimum
synaptic weights based on the minimi"ation of certain
performance index of error. !omatic learning consists of
the adaptation of the optimum value of the slope of the
non-linear activation function.
1. Neuro Computing
# human brain consists of approximately
$$
$%
computing
elements called neurons. They communicate through a
connection network of axons and synapses, having a
density of approximately
&
$%
synapses per neuron. The
human brain is thus a densely connected electrical
switching network, conditioned largely by the
biochemical process. The neuron is thus the
fundamental building block of a biological neural
network and operates in a chemical environment. #
typical neuron cell has three major regions' the soma
(cell body), the axon and the dendrites. The dendrites
form a dendrite tree, which is a very fine bush of thin
fibers around the neuron body. (entrites receive
information from the cell body through axons (long
fibers that serve as transmission lines). #n axon is a long
cylindrical connection that carries impulses from the
neuron. The end part of the axon splits in to a fine
elements, each branch of which terminates in a small
end bulb almost touching the dendrites of the
neighboring neurons. This axon- dendrite contact is
termed as a synapse. The synapse is where the neuron
introduces its signal (in terms of electrical impulses) to
the neighboring neuron. )urther more the neuron is
covered by a thin membrane.
*
# neuron will respond to the total of its inputs aggregated
over a short time interval (period of latent summation).
The neuron will respond if the total potential of its
membrane reaches a certain level. The neurons
generate a pulse response and send it to its axon only
under the satisfaction of certain conditions. The
incoming impulse may be excitatory if they cause
firing, or inhibitory if they hinder the firing. The
precise condition for firing is that the excitation should
exceed the inhibition by the amount called the
threshold of a neuron (a typical value for the threshold
is &% m+.).
The incoming impulses to neuron can only be generated by
the neighboring neurons or by the neuron itself (by
feedback). ,sually a certain number of impulses are
re-uired for a neuron to fire. mpulses that are closely
spaced in time and arrive synchronously are more likely to
cause a neuron to fire. .bservations showed that biological
neural networks perform temporal integration and
summation of electrical signals. The resulting spatio-
temporal processing performed by the biological neural
networks is a complex process and is less structured than in
digital computations. )urthermore the electrical impulses
are not synchroni"ed in time as opposed to the synchronous
discipline of digital computation. .ne important
characteristic feature %f the biological neuron is that the
magnitude of the signal generated does not differ
significantly. The signal in the nerve fiber is either absent
or has a maximum value. This means that the information
/
is transmitted between the nerve cells in the form of binary
signals.
#fter carrying a pulse, an axon fiber undergoes a state of
complete inactivity for a certain time called the refractory
period. )or this time interval the nerve does not conduct
any signals, regardless of the intensity of excitations. The
refractory period is not uniform over the cells. The time
units for modeling biological neurons may be of the order
of milliseconds. #lso there are different types of neurons
and different ways in which they are connected.
0ow understand that we are dealing with a dense network
of interconnected neurons that release asynchronous
signals, which are not only fed forward to the neighboring
neurons but also fed back to the generating neuron itself.
Thus the picture of the real phenomena in the biological
neural network becomes involved.
The brain is a highly nonlinear, complex, and parallel,
information processing system. 1uman brain has the ability
to arrange its structural constituents (neurons) to perform
certain operations like pattern recognition, perception and
motor control, many times faster than the fastest computer
available today. n what follows an example of such
operation by human brain is explained.
2onsider the human vision which is an information
processing task. The visual system continuously gives the
representation of the environment around us and supply the
information needed to react to it. The human brain
&
routinely accomplishes these perceptual recognition tasks
in approximately $%%-*%% msec. # digital computer will
take days to perform a much less complex task. 2onsider
for example, the sonar of a bat, which is an active echo
recognition system. The bat sonar gives information like
how far away the target are, the relative velocity of the
target, the si"e of the target, the si"e of the various features
of the target, and the a"imuth 3 elevation of the target.
The vestibule-ocular reflex (+.4) system is a part of the
vision operations performed by the human eye and the
brain. The function of the +.4 is to maintain the stability
of the retinal image by making the eye rotations opposite to
the head rotations. There are pre-motor neurons and motor
neurons which carry out any muscle movement. The pre-
motor neurons in the vestibular nuclei receives and process
head rotation (inputs) signals and sends the results to the
eye muscle motor neurons responsible for eye rotations.
!ince the above input and output signals are well defined it
is possible to modal such a vestibule-ocular reflex (+.4)
system.
n what follows two -uestions are asked.
1.1 hy Neurons are !ery slo"#
$. The axon is a long insulated conductor. t is a few
microns in diameter filled with a much poorer
conductor than copper, even a few millimeters will
have high resistance.
5
*. 0o insulation is perfect. !ome current will leak
through the membrane
/. # cell membrane is an insulating sheet tens of an
6ngstroms thick with conductors on both sides. The
membrane material has a high dielectric constant. !o
we should expect a large membrane capacity (a typical
value would be $
µ
) per
*
cm
).
0ow, the time constant which is proportional to the product
of the resistance and capacitance is also high.
1.$ hy the action potential is all%or%none#
# neuron will respond to the total of its inputs aggregated
over a short time interval (period of latent summation).
The neuron will respond if the total potential of its
membrane reaches a certain level. The neurons
generate a pulse response and send it to its axon only
under the satisfaction of certain conditions. The
incoming impulse may be excitatory if they cause
firing, or inhibitory if they hinder the firing. The
precise condition for firing is that the excitation should
exceed the inhibition by the amount called the
threshold of a neuron (a typical value for the threshold
is &% m+.).
1.& Computation 'y human 'rain
7e may have the complete knowledge of the neural
architecture and arrangements, yet the characterisation of
8
the high-level computation of the human mind remains a
mystery. This is because the electro chemical transmission
of signals and the adjustments of the synaptic (connection
weights) are involved and it is complex. This paradoxical
situation of human mind can be roughly explained as
follows'
magine connecting a logic analiser to a working 29, with
a completely known and well documented architecture. :et
all the signal flow from the logic analy"er to the 29, and
from 29, to the logic analy"er is known and is
documented and analy"ed. The knowledge of this activity
in the micro level is insufficient to explain the computation
that is taking place in the macro level.
0ote, however, that the primary purpose, application, and
objective of the human brain is survival. The time-evolved
performance of human intelligence reflects an attempt to
optimi"e this objective. The distinguishing characteristics
does not, however, reduce our interest in biological
computation since,
$. The brain integrates and stores experiences, which
could be previous classification or associations of
input data. n this sense it self organi"es experience.
2. The brain considers new experiences in the context of
stored experiences.
/. The brain is able to make accurate predictions about
new situations on the basis of previously self
organized experiences.
;
&. The brain does not reuire perfect information. t is
tolerant of deformations of input patterns or
perturbations in input data.
!. The brain seems to have available, perhaps unused,
neurons ready for use.
8. The brain does not provide, through microscopic or
macroscopic examination of its activity, much useful
information about its operation at high level.
;. The brain tends to cause behavior that homeostatic,
meaning <in a state of e-uilibrium (stable) or tending
towards such a state. This is an interesting feature
found in some recurrent neural networks such as in
1opfield and grossberg networks.
1.( The Artificial Neural Net"ork
The idea of artificial neural network has been motivated
from the recognition that the human brain computes in
entirely different way from the conventional digital
computer. !uch a neural network is defined as follows'
" neural network is a massively parallel distributed
processor made up of simple processing units, which has a
natural propensity for storing experimental knowledge and
making it available for use. #t resembles the brain in two
aspects$
%&' (nowledge is acuired by the network from its
environment through a process of learning
%2' #nterneuron connection strengths, called synaptic
weights, are used to store the acuired knowledge
1.) *epresentation of kno"ledge
=
>nowledge refers to stored information or modals used by
a person or machine to interpret, predict, and appropriately
respond to the outside world. The neural network will thus
learn the environment in which it is embedded. !uch a
knowledge learned is of two kinds'
&. The known world state, or the facts about what is and
what has been, >nown. This kind of knowledge is
referred to as prior information.
2. ?easurements (observations) obtained by using
sensors designed to probe the environment. This
information provides examples to train the neural
network.
The examples may be labeled or unlabelled. n labeled
examples, each example representing an input signal is
paired with a target or desired response. ,nlabelled
examples consists of different realisations of the input
signal by itself. The neural network will then ac-uire
knowledge by training using these examples that are
labeled or unlabelled.
The knowledge representation inside the neural network is
rather complicated. n what follows four rules are explained
which are of common sense in nature.
*ule 1. t is obvious that similar inputs from similar
class usually produce similar representations inside the
network and therefore they should be classified as
belonging to the same category.
@
.ne usually used measure of similarity is the Auclidean
distance. The Auclidean distance between a pair of
vectors i
x
and
j
x
in the Auclidean space
m
) is given by
* B $
$
*
) (
) (
1
]
1

¸

∑ − ·
− · −
·
m
k
jk ik
j i j i
x x
d x x x x

The similarity between the two inputs is defined as the
reciprocal of the Auclidean distance between the two
vectors. :esser the distance more similar the inputs are.
*ule $. !econd rule is just opposite of the first rule.
tems to be separated as separate classes should be given
widely different representations in the network.
2onse-uently, the more is the Auclidean distance the
inputs are more separate.
*ule &. f a particular feature is important+ then
more number of neurons should be used for the
representation of that event in the network.
*ule (. 9rior informationCs and invariance should be
built in to the network, so that they need not be learned
and these results in the reduction of the network
architecture. The free parameters to be adjusted are
reduced and this results in less number of building
blocks and less cost. 1ere we are talking about
speciali"ed networks. Diological neural networks are
speciali"ed indeed.
There are no general rules for incorporating prior
iformatios and invariance. t is possible to incorporate
prior information in to the network architecture by
weight sharing and locali"ed connections. nvariance
$%
actually means invariance to transformations. nvariance
to transformations can be achieved
(i) by structure
(ii) by training
1., Characteristics of Neural net"ork
1. -enerali.ation
# neural network derives its computing power due to (i) ts
massive parallel-distributed structure (ii) ts ability to learn.
Thus we train the network using some training examples.
The network will give an appropriate response if we give
an example that is not included in the training examples
used for training.
$. Nonlinearity
The basic model of a neural network is nonlinear if the
activation function is nonlinear (that is usually the case).
0onlinearity is an important feature, since the underlying
physical mechanism is nonlinear. )urthermore the
nonlinearity is distributed trough out the network.
&. Adaptation
• # neural network is inherently adaptive
• 7hen a neural network is doing a task two features are
involvedE space and the time
• The training of a neural network is usually done in a
stationary environment
• Dut the environment will change continuously
• !o a spatiotemporal training is re-uired. The synaptic
weights of the network (weight space) will change
continuously
$$
• #s a result when the environment changes the training
examples as well as the weight space changes.
• This is a continuous process in all animals
• !uch a continuous change is also possible in an
artificial neural network.
• n other words the training process in an artificial
neural network is continuous and the free parameters
of the system should continuously adapt to the
environment
• The -uestion that arises is how often this adaptation
should take placeF That depends on the application
• #n unsupervised training will be better than
supervised training, as is the case in human brainF
1./ 0odels of a neuron
# neuron is an information-processing unit. # neural
network consists of a number of such units. The figure
shows the model of a neuron. .ne can identify three basic
ingredients of such a neuron model.
%i' # set of connecting links called synapses between
the input signals
m j x
j
,..., * , $ E ·
and neuron k. !uch
synapses are characterised by their synaptic weights
m j w
kj
,..., * , $ E ·
. 0ote that the subscripts of w are kj
and not jk, the meaning of which will be clear when
we deal with the back propagation algorithm for
training the neuron.
%ii' #n adder which sums up the input signals weighed
(multiplied) by their respective synapses.
%iii' t is re-uired to limit the amplitude of the output of
the neuron to some finite value. The amplitude of
$*
$ k
w
* k
w
km
w
the output of a neuron may be limited to the range
G%,$H or G-$,$H. This operation is carried out by a
s-uashing function called the nonlinear activation
function.
$
x
bias k
b

*
x
k
v
k
y

 
m
x
The above neuron model also includes an externally
applied bias term k
b
. The effect of the bias term is to
increase or lower the net input of the activation function as
shown in figure.
nduced local field, k
v

% >
k
b


% ·
k
b


% <
k
b



%
:inear combiners output, k
u

7e will describe the neuron
k
by using the following set of
e-uations'
) v (
) b u ( y
x w u
k
k k k
m
$ j
j kj k
ϕ ·
+ ϕ ·
∑ ·
·

$/
(.) ϕ
% k
w
$ k
w
km
w
To incorporate the bias term as an input term, the neuron
model may be modified. #ccordingly the e-uations are
modified as
) (
%
k k
m
j
j kj k
v y
x w u
ϕ ·
· ∑
·
%
x
k k
b w ·
%

$
x
k
v
k
y

 
1.1 Signal 2lo" -raph of a Neuron
The signal flow graph of a single neuron is shown in the
figure below. .ne can identify the source nodes, the
computation node and the communication links from the
figure.

$
%
+ · x

$
x


*
x
k
v

(.) ϕ
k
y


m
x

$&
(.) ϕ
m
x
!ignal flow graph of a neuron
1.3 Types of Acti!ation function
Three types of activation functions are explained below.
$. Threshold 2unction:
#s shown in the figure, we have

¹
'
¹ ≥
<
·
% $
% %
) (
k
k
v if
v if
v ϕ

) (v ϕ
$


k
v

¹
'
¹ ≥
< −
·
% $
% $
) (
k
k
v if
v if
v ϕ
$

) (v ϕ

This type of neuron model is k
v
known as *c+ulloch pits model
-$

*. 4iece"ise%5inear 2unction

¹
¹
¹
¹
¹
'
¹
− ≤
> > −
+ ≥
·
*
$
*
$
*
$
*
$
%
$
) (
k
k
k
v if
v if v
v if
v ϕ

) (v ϕ


*
$


*
$
k
v

/. Sigmoid 2unction 6 5ogistic 2unction 7
$5
The !-shaped sigmoid function is the most commonly used
activation function.

) v (
k
ϕ

)
k
v a (
k
e $
$
) v (

+
· ϕ


k
v
Note:%
$. The sigmoid function is differentiable, where as the
threshold function is not. (ifferentiability is an
important feature of the neural network theory.
*. #s
$ to % ) v ( , to a
k
→ ϕ ∞ −∞ →
, and it reduces to threshold
function.
/. The logistic function coined its name from the the
transcendental law of logistic growth. ?easured in
appropriate units, all growth process are supposed to
be represented by the logistic distribution function
β − α
+
·
t
e $
$
) t ( )
7here t represents time and
β α ,
are constans.

#nother example of the odd sigmoid function which ranges
from -$ to I$ is the hyperbolic tangent function (the
sigmum function) given by the expression
$
$
*
$
$
*
tan ) (

+
·

+


·

,
_

¸
¸
·
−av
e
av
e
av
e
av
h v ϕ
$8
This is bipolar continuous activation function between J$
and $. 7ith
t∞ → a
, we have a bipolar hard limiting
activation function with output as either J$ or $.
1.18 9xercises:
$. !how that the derivative of the logistic function w.r.t v is
)H v ( $ G ) v ( a ) v ( ϕ − ϕ · ϕ′
7hat is the value of this derivative at the originF
( )
( ) ) ( $ ) (
$ $
$
) (
) (
$
$
) (
*
) (
v v a
e
e
e
a
e
e a
v d
v d
v
e
v
v a
v a
v a
v a
v a
v a
ϕ ϕ
ϕ
ϕ
ϕ
− ·

,
_

¸
¸
+

,
_

¸
¸
+
·
+
· · ′
+
·






#t
*
$
) v ( , % v · ϕ ·
Therefore,
K
&
a
) ( ) $ ( a ) % (
*
$
*
$
·
− · ϕ′
*. !how that the derivative of the tansigmoid function w.r.t
v is
)H v ( $ G ) v (
*
*
a
ϕ − · ϕ′
7hat is the value of this derivative at the originF
$;
( )
*
a
) % (
) v ( $
*
a
*
av
h tan $
*
a
*
av
h sec
*
a
) v (
*
av
h tan ) v (
*
* *
· ϕ′
ϕ − ·

,
_

¸
¸

,
_

¸
¸
− ·
,
_

¸
¸
· ϕ′

,
_

¸
¸
· ϕ
/. n logistic activation function, the presence of the
constant <aC has the same effect of multiplying all the inputs
with <aC.
∑ − +
·
∑ − +
·
+
· ϕ

i
i $
i
i $
) v a (
) x a w exp( $
$
) x w a exp( $
$
e $
$
) v (
&. !how that
(i) # linear neuron may be approximated as a neuron
with sigmoidal activation function with small synaptic
weights.
( 1int' )or small values of x,
x $ e
x
− ≈

)
(ii) a ?c2ulloch-9its modal of a neuron may be
approximated as a neuron with sigmoidal activation
function with large synaptic weights.

hat a single neuron can do#
1.11 5ogic operations performed 'y ANN
5ogical AND :
$=
2onsider the truth table illustrating an #0( gate

$
x

$
w
v y
*
x
*
w
hard limiter
b

5 . $ ,
$
$
*
$
− ·
1
]
1

¸

·
1
]
1

¸

b
w
w
5ogical :* :
2onsider the truth table illustrating the .4 gate


5 . % b ,
$
$
w
w
*
$
− ·
1
]
1

¸

·
1
]
1

¸

Note: The implementations of #0( and .4
logic functions differ only by the value of the
bias
Complement

$ - w ·
v y

x
hard limiter
5 . % · b
*
x
$
x
y
% % %
% $ %
$ % %
$ $ $
*
x
$
x
y
% % %
% $ $
$ % $
$ $ $
x
y
$ %
% $
$@

Axercises
$. !how the implementation of 0#0( and 0.4 gates.
*. Try the implementation of an L.4
1.1$ 0emory Cell
# single neuron with single input with both weight and bias
values of unity, computes
k k
x ·
+$
%
. !uch a simple network
thus behaves as a single register cell, able to retain the input
for one time period. #s a conse-uence, once a feedback
loop is closed around the neuron as shown in the figure, we
obtain a memory cell. #n excitatory input of $ initiali"es
the firing in the memory cell, and an inhibitory input of
one initiali"es a non-firing state. The output value, at the
absdence of inputs, is then sustained indefinitely. This is
because the output of "ero fed back to the input does not
cause firing at the next instant while the output of $ does.
*%
1.1& e "ill 4ause an ;dentification 4ro'lem
2onsider a dynamic system with m inputs x%i' such that
T
m
n x n x n x i x )H ( ), ( ), ( G ) (
* $
  ·
:et we do not know anything about the system other than
that it produces a single output
d%n'
, when simulated by the
input vector. Thus the external behavior of the system is
represented by'
{ }   , , , * , $ )E ( ), ( ' M p i n d n x ·
0ow we will pose the problem'
1ow to design a multiple input-single output modal of the
dynamic system using a single neuron (perceptron)
f we assume the neuron is linear (with linear activation
function), the output y%n'is the same as the induced local
field v%n'E ie

·
· ·
m
k
k k
n x n w n v n y
%
) ( ) ( ) ( ) (
where
%n' w
k is the m synaptic weights measured at the time
n. 0ow we have the error e%n',d%n'-y%n'. 0ow the
adaptation of the synaptic weights is straight forward using
the unconstrained optimi"ation techni-ues like steepest
descent, 0ewtonCs method, Nauss-0ewtonCs method etc.
) (
%
n x
*$

) (
$
n w

) (
$
n x

) (
*
n w



$
) (n d
-$

) (n v

) (n y
) (n x
j
) (n w
j
) (n e


) (n w
m


) (n x
m

1.1& Net"ork Architectures
1. Single layer feed for"ard net"ork
;nput layer of :utput layer
**
source nodes of neurons
1. 0ulti layer feed for"ard net"ork
;nput layer of layer of layer of output
source nodes hidden neurons neurons
*/