This action might not be possible to undo. Are you sure you want to continue?
NEURAL NETWORKS
Vedat Tav
Vedat Tav
ş
ş
ano
ano
ğ
ğ
lu
lu
What Is a Neural Network?
What Is a Neural Network?
Work on
Work on
artificial neural networks
artificial neural networks
,
,
commonly referred to as "
commonly referred to as "
neural networks
neural networks
,"
,"
has been motivated right from its inception
has been motivated right from its inception
by the recognition that the
by the recognition that the
brain computes
brain computes
in an entirely different way
in an entirely different way
from the
from the
conventional digital computer.
conventional digital computer.
What Is a Neural Network?
What Is a Neural Network?
The struggle to understand the brain owes much
The struggle to understand the brain owes much
to the pioneering work of Ramon y Cajal (1911),
to the pioneering work of Ramon y Cajal (1911),
who introduced the idea of
who introduced the idea of
neurons
neurons
as
as
structural
structural
constituents of the brain.
constituents of the brain.
Typically, neurons are five to six orders of
Typically, neurons are five to six orders of
magnitude slower than silicon logic gates; events
magnitude slower than silicon logic gates; events
in a silicon chip happen in the nanosecond (10
in a silicon chip happen in the nanosecond (10
 9 9
s)
s)
range, whereas neural events happen in the
range, whereas neural events happen in the
millisecond (10
millisecond (10
 3 3
s) range.
s) range.
What Is a Neural Network?
What Is a Neural Network?
However, the brain makes up for the
However, the brain makes up for the
relatively slow rate of operation of a neuron
relatively slow rate of operation of a neuron
by having a truly staggering number of
by having a truly staggering number of
neurons (
neurons (
nerve cells
nerve cells
) with massive
) with massive
interconnections between them.
interconnections between them.
What Is a Neural Network?
What Is a Neural Network?
It is estimated that there must be on the order of
It is estimated that there must be on the order of
10 billion neurons in the human cortex, and 60
10 billion neurons in the human cortex, and 60
trillion synapses or connections (Shepherd and
trillion synapses or connections (Shepherd and
Koch, 1990). The net result is that the brain is an
Koch, 1990). The net result is that the brain is an
enormously efficient structure. Specifically, the
enormously efficient structure. Specifically, the
energetic efficiency
energetic efficiency
of the brain is approximately
of the brain is approximately
10
10
 16 16
joules (J) per operation per second.
joules (J) per operation per second.
The corresponding value for the best computers
The corresponding value for the best computers
in use
in use
today
today
is about 10
is about 10
 6 6
joules per operation per
joules per operation per
second (Faggin, 1991).
second (Faggin, 1991).
What Is a Neural Network?
What Is a Neural Network?
The brain is a highly
The brain is a highly
complex, nonlinear,
complex, nonlinear,
and parallel computer
and parallel computer
(
(
information
information


processing system
processing system
). It has the capability of
). It has the capability of
organizing neurons so as to perform certain
organizing neurons so as to perform certain
computations (
computations (
e.g., pattern recognition,
e.g., pattern recognition,
perception, and motor control
perception, and motor control
) many times
) many times
faster than the fastest digital computer in
faster than the fastest digital computer in
existence today.
existence today.
What Is a Neural Network?
What Is a Neural Network?
Consider, for example, human
Consider, for example, human
vision
vision
,
,
which is an
which is an
information
information


processing task (Churchland and
processing task (Churchland and
Sejnowski, 1992; Levine, 1985; Marr, 1982).
Sejnowski, 1992; Levine, 1985; Marr, 1982).
It is the function of the visual system to provide a
It is the function of the visual system to provide a
representation
representation
of the environment around us and,
of the environment around us and,
more important, to supply the information we
more important, to supply the information we
need to
need to
interact
interact
with the environment.
with the environment.
What Is a Neural Network?
What Is a Neural Network?
The brain routinely accomplishes perceptual
The brain routinely accomplishes perceptual
recognition tasks (e.g., recognizing a
recognition tasks (e.g., recognizing a
familiar face embedded in an unfamiliar
familiar face embedded in an unfamiliar
scene) in something of the order of 100
scene) in something of the order of 100


200
200
ms, whereas tasks of much lesser complexity
ms, whereas tasks of much lesser complexity
will take hours on conventional computers.
will take hours on conventional computers.
For another example, consider the
For another example, consider the
sonar
sonar
of a bat.
of a bat.
Sonar is an active echo
Sonar is an active echo


location system.
location system.
In addition to providing information about how far
In addition to providing information about how far
away a target (e.g., a flying insect) is, a bat sonar
away a target (e.g., a flying insect) is, a bat sonar
conveys information about the relative velocity of
conveys information about the relative velocity of
the target, the size of the target, the size of various
the target, the size of the target, the size of various
features of the target, and the azimuth and
features of the target, and the azimuth and
elevation of the target (Suga, 1990a, b).
elevation of the target (Suga, 1990a, b).
What Is a Neural Network?
What Is a Neural Network?
The complex neural computations needed to
The complex neural computations needed to
extract all this information from the target
extract all this information from the target
echo occur within a brain the size of a plum.
echo occur within a brain the size of a plum.
Indeed, an echo
Indeed, an echo


locating bat can pursue and
locating bat can pursue and
capture its target with a facility and success
capture its target with a facility and success
rate that would be the envy of a radar or
rate that would be the envy of a radar or
sonar engineer.
sonar engineer.
What Is a Neural Network?
What Is a Neural Network?
How, then, does a human brain or the brain
How, then, does a human brain or the brain
of a bat do it?
of a bat do it?
At birth, a brain has great structure and the
At birth, a brain has great structure and the
ability to build up its own rules through
ability to build up its own rules through
what we usually refer to as "experience."
what we usually refer to as "experience."
What Is a Neural Network?
What Is a Neural Network?
Indeed, experience is built up over the years,
Indeed, experience is built up over the years,
with the most dramatic development (i.e.,'
with the most dramatic development (i.e.,'
·
·
hard
hard


wiring) of the human brain taking place
wiring) of the human brain taking place
in the first two years from birth; but the
in the first two years from birth; but the
development continues well beyond that stage.
development continues well beyond that stage.
During this early stage of development, about 1
During this early stage of development, about 1
million synapses are formed per second.
million synapses are formed per second.
What Is a Neural Network?
What Is a Neural Network?
Synapses
Synapses
are elementary structural and
are elementary structural and
functional units that mediate the
functional units that mediate the
interactions between neurons. The most
interactions between neurons. The most
common kind of synapse is a
common kind of synapse is a
chemical
chemical
synapse
synapse
,
,
which operates as follows:
which operates as follows:
What Is a Neural Network?
What Is a Neural Network?
A presynaptic process liberates a
A presynaptic process liberates a
transmitter
transmitter
substance that diffuses across the synaptic
substance that diffuses across the synaptic
junction between neurons and then acts on a
junction between neurons and then acts on a
postsynaptic process.
postsynaptic process.
Thus a synapse converts a presynaptic
Thus a synapse converts a presynaptic
electrical signal into a chemical signal and
electrical signal into a chemical signal and
then back into a postsynaptic electrical
then back into a postsynaptic electrical
signal (Shepherd and Koch, 1990).
signal (Shepherd and Koch, 1990).
What Is a Neural Network?
What Is a Neural Network?
In electrical terminology, such an element is
In electrical terminology, such an element is
said to be a
said to be a
nonreciprocal two
nonreciprocal two


port device
port device
.
.
In traditional descriptions of neural
In traditional descriptions of neural
organization, it is assumed that a synapse is
organization, it is assumed that a synapse is
a simple connection that can impose
a simple connection that can impose
excitation
excitation
or
or
inhibition
inhibition
,
,
but not both on the
but not both on the
receptive neuron.
receptive neuron.
What Is a Neural Network?
What Is a Neural Network?
A developing neuron is synonymous with a plastic
A developing neuron is synonymous with a plastic
brain:
brain:
Plasticity
Plasticity
(
(
[Latin plasticus, from Greek
[Latin plasticus, from Greek
plastikos, from plastos, molded, from plassein, to
plastikos, from plastos, molded, from plassein, to
mold; see pel
mold; see pel
ə
ə


2 in Indo
2 in Indo


European roots.]
European roots.]
)
)
permits
permits
the developing nervous system to adapt to its
the developing nervous system to adapt to its
surrounding environment (Churchland and
surrounding environment (Churchland and
Sejnowski, 1992; Eggermont, 1990). In an adult
Sejnowski, 1992; Eggermont, 1990). In an adult
brain, plasticity may be accounted for by two
brain, plasticity may be accounted for by two
mechanisms: the creation of new synaptic
mechanisms: the creation of new synaptic
connections between neurons, and the
connections between neurons, and the
modification of existing synapses.
modification of existing synapses.
What Is a Neural Network?
What Is a Neural Network?
Axons
Axons
,
,
the transmission lines, and
the transmission lines, and
dendrites
dendrites
,
,
the
the
receptive zones, constitute two types of cell
receptive zones, constitute two types of cell
filaments that are distinguished on morphological
filaments that are distinguished on morphological
grounds; an axon has a smoother surface, fewer
grounds; an axon has a smoother surface, fewer
branches, and greater length, whereas a dendrite
branches, and greater length, whereas a dendrite
(so called because of its resemblance to a tree)
(so called because of its resemblance to a tree)
has an irregular surface and more branches
has an irregular surface and more branches
(Freeman, 1975).
(Freeman, 1975).
What Is a Neural Network?
What Is a Neural Network?
Neurons come in a wide variety of shapes
Neurons come in a wide variety of shapes
and sizes in different parts of the brain. The
and sizes in different parts of the brain. The
figure illustrates the shape of a
figure illustrates the shape of a
pyramidal
pyramidal
cell
cell
,
,
which is one of the most common types
which is one of the most common types
of cortical neurons.
of cortical neurons.
Like many other types of neurons, it receives
Like many other types of neurons, it receives
most of
most of
its inputs through dendritic spines.
its inputs through dendritic spines.
The pyramidal cell can receive 10,000 or
The pyramidal cell can receive 10,000 or
more synaptic contacts and it can project
more synaptic contacts and it can project
onto thousands of target cells.
onto thousands of target cells.
What Is a Neural Network?
What Is a Neural Network?
Just as plasticity appears to be essential to the
Just as plasticity appears to be essential to the
functioning of neurons as information
functioning of neurons as information
processing
processing
units in the human brain, so it is with neural
units in the human brain, so it is with neural
networks made up of artificial neurons.
networks made up of artificial neurons.
What Is a Neural Network?
What Is a Neural Network?
In its most general form, a
In its most general form, a
neural network
neural network
is
is
a machine that is designed to
a machine that is designed to
model
model
the way
the way
in which the brain performs a particular task
in which the brain performs a particular task
or function of interest; the network is usually
or function of interest; the network is usually
implemented using electronic components
implemented using electronic components
or simulated in software on a digital
or simulated in software on a digital
computer.
computer.
What Is a Neural Network?
What Is a Neural Network?
In most cases the interest is confined largely
In most cases the interest is confined largely
to an important class of neural networks that
to an important class of neural networks that
perform useful computations through a
perform useful computations through a
process of
process of
learning
learning
.
.
What Is a Neural Network?
What Is a Neural Network?
To achieve good performance, neural
To achieve good performance, neural
networks employ a massive interconnection
networks employ a massive interconnection
of simple computing cells referred to as
of simple computing cells referred to as
"
"
neurons
neurons
" or "
" or "
processing units
processing units
." We may
." We may
thus offer the following definition of a neural
thus offer the following definition of a neural
network viewed as an adaptive machine:
network viewed as an adaptive machine:
What Is a Neural Network?
What Is a Neural Network?
A neural network is a massively parallel distributed
A neural network is a massively parallel distributed
processor that has a natural propensity for storing
processor that has a natural propensity for storing
experiential knowledge and making it available for use.
experiential knowledge and making it available for use.
It resembles the brain in two respects:
It resembles the brain in two respects:
1
1
Knowledge is acquired by the network
Knowledge is acquired by the network
through a learning process .
through a learning process .
2
2
Interneuron connection strengths known as
Interneuron connection strengths known as
synaptic weights are used to store the knowledge.
synaptic weights are used to store the knowledge.
What Is a Neural Network?
What Is a Neural Network?
The procedure used to perform the learning
The procedure used to perform the learning
process is called a
process is called a
learning algorithm
learning algorithm
,
,
the function of which is to modify the synaptic
the function of which is to modify the synaptic
weights of the network
weights of the network
in an orderly fashion so as to attain a desired
in an orderly fashion so as to attain a desired
design objective.
design objective.
What Is a Neural Network?
What Is a Neural Network?
The modification of synaptic weights provides the
The modification of synaptic weights provides the
traditional method for the design of neural
traditional method for the design of neural
networks. Such an approach is the closest to
networks. Such an approach is the closest to
linear adaptive filter theory
linear adaptive filter theory
, which is already well
, which is already well
established and successfully applied in such
established and successfully applied in such
diverse fields as communications, control, radar,
diverse fields as communications, control, radar,
sonar, seismology, and biomedical engineering
sonar, seismology, and biomedical engineering
(Haykin, 1991; Widrow and Stearns, 1985).
(Haykin, 1991; Widrow and Stearns, 1985).
What Is a Neural Network?
What Is a Neural Network?
However, it is also possible for a neural
However, it is also possible for a neural
network to modify its own topology, which
network to modify its own topology, which
is motivated by the fact that neurons in the
is motivated by the fact that neurons in the
human brain can die and that new synaptic
human brain can die and that new synaptic
connections can grow.
connections can grow.
What Is a Neural Network?
What Is a Neural Network?
Neural networks are also referred to in the
Neural networks are also referred to in the
literature as
literature as
neurocomputers, connectionist
neurocomputers, connectionist
networks, parallel distributed processors,
networks, parallel distributed processors,
etc.
etc.
What Is a Neural Network?
What Is a Neural Network?
Benefits of Neural Networks
Benefits of Neural Networks
From the above discussion, it is apparent that a
From the above discussion, it is apparent that a
neural network derives its computing power
neural network derives its computing power
through:
through:
1. 1.
its
its
massively parallel distributed structure
massively parallel distributed structure
,
,
1. 1.
its ability to learn and therefore
its ability to learn and therefore
generalize
generalize
;
;
generalization
generalization
refers to the neural network
refers to the neural network
producing reasonable outputs for inputs
producing reasonable outputs for inputs
not
not
encountered during training
encountered during training
(learning).
(learning).
How does the following example help you to generalize ? How does the following example help you to generalize ?
confer confer: L : L. . conferre conferre  con con , together, , together, ferre, ferre, to bring to bring
v.t. v.t. to give, to bestow (to place or put by), to talk or consult tog to give, to bestow (to place or put by), to talk or consult together ether
defer defer: L. : L. differre differre  dis dis , , asunder ( asunder (adv adv. apart, into parts, separately), . apart, into parts, separately), ferre ferre, to bear , to carry , to bear , to carry
v.t. v.t. to put off to another time, to delay to put off to another time, to delay
defer defer: L. : L. deferre deferre  de de , , down, down, ferre ferre, to bear , to bear
v.i. v.i. to yield (to the wishes or opinions of another, or to authorit to yield (to the wishes or opinions of another, or to authority), y), v.t. v.t. to submit or to submit or
to or to lay before somebody to or to lay before somebody
differ differ: L. : L. differre differre  dif.( dif.( for for dis dis ), ), apart, apart, ferre ferre, to bear , to bear
v.i. v.i. to be unlike, distinct or various to be unlike, distinct or various
infer infer: L. : L. inferre inferre  in in , , into, into, ferre ferre, to bring , to bring
v.t.. v.t.. to bring on, to drive as a conclusion to bring on, to drive as a conclusion
prefer prefer: L. : L. preaferre preaferre  prea prea ,in front of, ,in front of, ferre ferre, to bear , to bear
v.t. v.t. to set in front, to put forward, offer, submit, present, for ac to set in front, to put forward, offer, submit, present, for acceptance or consideration, ceptance or consideration,
to promote to promote
Benefits of Neural Networks
Benefits of Neural Networks
convene convene L. L. convenire convenire, con , con  together, and together, and venire venire, to come , to come
v.i. v.i. to come together, to come together, v.i. v.i. to call together to call together
c convent onvent v.t. v.t. to convene to convene
convention convention the act of convening, :an assembly, esp. of the act of convening, :an assembly, esp. of
special delegates for some common object, an agreement special delegates for some common object, an agreement
(Geneva Convention) (Geneva Convention)
invent invent L. L. invenire invenire, inventum, in , inventum, in , , upon, upon, venire, venire, to come to come
v.t. v.t. to find, to device or contrive to find, to device or contrive
prevent prevent L. L. preavenire preavenire, prea , prea  in front of, in front of, venire venire, to come , to come
v.t. v.t. to precede, to be, go, act, earlier than, to preclude, to stop, to precede, to be, go, act, earlier than, to preclude, to stop,
keep, or hinder effectually, to keep from coming to pass keep, or hinder effectually, to keep from coming to pass
Benefits of Neural Networks
Benefits of Neural Networks
Benefits of Neural Networks
Benefits of Neural Networks
Synonym
Synonym
:
:
1432 (but rare before 18c.), from L.
1432 (but rare before 18c.), from L.
synonymum,
synonymum,
from Gk.
from Gk.
synonymon
synonymon
"word having the
"word having the
same sense as another," noun use of neut. of
same sense as another," noun use of neut. of
synonymos
synonymos
"having the same name as,
"having the same name as,
synonymous," from
synonymous," from
syn
syn


"together, same" +
"together, same" +
onyma,
onyma,
Aeolic dialectal form of
Aeolic dialectal form of
onoma
onoma
"name"
"name"
(see
(see
name
name
).
).
Synonymous
Synonymous
is attested from 1610.
is attested from 1610.
Benefits of Neural Networks
Benefits of Neural Networks
Antonym:
Antonym:
1870, created to serve as opposite of
1870, created to serve as opposite of
synonym
synonym
, from Gk.
, from Gk.
anti
anti


"equal to, instead of,
"equal to, instead of,
opposite" (see
opposite" (see
anti
anti


) +
) +


onym
onym
"name" (see
"name" (see
name
name
).
).
Anonym
Anonym
ous
ous
:
:
1601, from Gk.
1601, from Gk.
anonymos
anonymos
"without a
"without a
name," from
name," from
an
an


"without" +
"without" +
onyma,
onyma,
Æ
Æ
olic
olic
dialectal form of
dialectal form of
onoma
onoma
"name" (see
"name" (see
name
name
).
).
These two infor
These two infor
mation
mation


processing
processing
capabilities,i.e.,
capabilities,i.e.,
(1)
(1)
massively parallel distributed structure
massively parallel distributed structure
(2)
(2)
the ability to
the ability to
generalize
generalize
make it possible for neural networks to solve
make it possible for neural networks to solve
complex (large
complex (large


scale) problems that are
scale) problems that are
currently intractable. In practice, however,
currently intractable. In practice, however,
neural networks cannot provide the solution
neural networks cannot provide the solution
working by themselves alone. Rather, they need
working by themselves alone. Rather, they need
to be integrated into a consistent system
to be integrated into a consistent system
engineering approach.
engineering approach.
Benefits of Neural Networks
Benefits of Neural Networks
Specifically, a complex problem of interest is
Specifically, a complex problem of interest is
decomposed
decomposed
into a number of relatively simple tasks,
into a number of relatively simple tasks,
and neural networks are assigned a subset of the tasks
and neural networks are assigned a subset of the tasks
e.g.,
e.g.,
1. 1.
pattern recognition,
pattern recognition,
2. 2.
associative memory,
associative memory,
3. 3.
control,etc.
control,etc.
that
that
match
match
their inherent capabilities. It is important to
their inherent capabilities. It is important to
recognize, however, that we have a long way to go (
recognize, however, that we have a long way to go (
if
if
ever
ever
) before we can build a computer architecture that
) before we can build a computer architecture that
mimics the human brain.
mimics the human brain.
Benefits of Neural Networks
Benefits of Neural Networks
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
1.
1.
Nonlinearity
Nonlinearity
A neuron is basically a nonlinear device.
A neuron is basically a nonlinear device.
Consequently, a neural network, made up of an
Consequently, a neural network, made up of an
interconnection of neurons, is itself nonlinear.
interconnection of neurons, is itself nonlinear.
Moreover, the
Moreover, the
nonlinearity
nonlinearity
is of a special kind in
is of a special kind in
the sense that it is
the sense that it is
distributed
distributed
throughout the
throughout the
network. Nonlinearity is a highly important
network. Nonlinearity is a highly important
property, particularly if the underlying physical
property, particularly if the underlying physical
mechanism responsible for the generation of an
mechanism responsible for the generation of an
input signal (e.g., speech signal) is inherently
input signal (e.g., speech signal) is inherently
nonlinear.
nonlinear.
2. Input
2. Input


Output Mapping
Output Mapping
A popular paradigm of learning called
A popular paradigm of learning called
supervised learning
supervised learning
involves the modification
involves the modification
of the synaptic weights of a neural network by
of the synaptic weights of a neural network by
applying a set of labeled
applying a set of labeled
training samples
training samples
or
or
task examples
task examples
.
.
Each example consists of a
Each example consists of a
unique
unique
input signal
input signal
and the corresponding
and the corresponding
desired response
desired response
.
.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
The network is presented an example picked at
The network is presented an example picked at
random from the set,
random from the set,
and
and
the synaptic weights(free parameters) of the
the synaptic weights(free parameters) of the
network are modified so as to minimize the
network are modified so as to minimize the
difference between the desired response and the
difference between the desired response and the
actual response of the network
actual response of the network
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
The training of the network is repeated for many
The training of the network is repeated for many
examples in the set until the network reaches a
examples in the set until the network reaches a
steady state, where there are no further significant
steady state, where there are no further significant
changes in the synaptic weights;
changes in the synaptic weights;
The previously applied training examples may be
The previously applied training examples may be
reapplied during the training session but in a
reapplied during the training session but in a
different order.
different order.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
Thus the network learns from the examples by
Thus the network learns from the examples by
constructing an
constructing an
input
input


output mapping
output mapping
for the
for the
problem at hand.
problem at hand.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Such an approach brings to mind the study of
Such an approach brings to mind the study of
nonparametric statistical inference
nonparametric statistical inference
which is a branch of
which is a branch of
statistics dealing with model
statistics dealing with model


free estimation, or, from a
free estimation, or, from a
biological viewpoint,
biological viewpoint,
tabula rasa
tabula rasa
learning (Geman et al.,
learning (Geman et al.,
1992).
1992).
(
(
tabula rasa:
tabula rasa:
a smoothed or blank tablet, a mind not yet
a smoothed or blank tablet, a mind not yet
influenced by outside impressions and experiences)
influenced by outside impressions and experiences)
(
(
[Medieval Latin tabula r
[Medieval Latin tabula r
ā
ā
sa : Latin tabula,
sa : Latin tabula,
tablet
tablet
+ Latin
+ Latin
r
r
ā
ā
sa, feminine of r
sa, feminine of r
ā
ā
sus,
sus,
erased
erased
.]
.]
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
Consider, for example, a
Consider, for example, a
pattern classification
pattern classification
task,
task,
where the requirement is to assign an input signal
where the requirement is to assign an input signal
representing a physical object or event to one of several
representing a physical object or event to one of several
prespecified categories (classes).
prespecified categories (classes).
In a nonparametric approach to this problem, the
In a nonparametric approach to this problem, the
requirement is to "estimate" arbitrary decision
requirement is to "estimate" arbitrary decision
boundaries in the input signal space for the pattern
boundaries in the input signal space for the pattern


classification task using a set of examples, and to do so
classification task using a set of examples, and to do so
without
without
invoking a probabilistic distribution model.
invoking a probabilistic distribution model.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
A similar point of view is implicit in the
A similar point of view is implicit in the
supervised learning paradigm, which suggests
supervised learning paradigm, which suggests
a close analogy between the input
a close analogy between the input


output
output
mapping performed by a neural network and
mapping performed by a neural network and
nonparametric statistical inference.
nonparametric statistical inference.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
p
p
aradigm
aradigm
:
:
1.
1.
Grammar.
Grammar.
a.
a.
a set of forms all of which
a set of forms all of which
contain a particular element, esp. the set of all
contain a particular element, esp. the set of all
inflected forms based on a single stem or theme.
inflected forms based on a single stem or theme.
b.
b.
a display in fixed arrangement of such a set, as
a display in fixed arrangement of such a set, as
boy, boy's, boys, boys'.
boy, boy's, boys, boys'.
2.
2.
an example serving as a
an example serving as a
model; pattern. [Origin:
model; pattern. [Origin:
1475
1475
–
–
85;
85;
< LL
< LL
parad
parad
ī
ī
gma
gma
< Gk
< Gk
par
par
á
á
deigma
deigma
pattern (verbid of
pattern (verbid of
paradeiknýnai
paradeiknýnai
to show side by side), equiv. to
to show side by side), equiv. to
para
para


para
para


1
1
+
+
deik
deik


,
,
base of
base of
deiknýnai
deiknýnai
to show (see
to show (see
deictic
deictic
) +
) +


ma
ma
n.
n.
suffix ]
suffix ]
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
analogy
analogy
:
:
1550, from L.
1550, from L.
analogia,
analogia,
from Gk.
from Gk.
analogia
analogia
"proportion," from
"proportion," from
ana
ana


"upon,
"upon,
according to" +
according to" +
logos
logos
"ratio," also "word,
"ratio," also "word,
speech, reckoning." A mathematical term
speech, reckoning." A mathematical term
used in a wider sense by Plato.
used in a wider sense by Plato.
3.
3.
Adaptivity
Adaptivity
.
.
Neural networks have a built
Neural networks have a built


in capability to
in capability to
adapt
adapt
their synaptic weights to changes in the
their synaptic weights to changes in the
surrounding environment. In particular, a neural
surrounding environment. In particular, a neural
network trained to operate in a specific
network trained to operate in a specific
environment can be easily
environment can be easily
retrained
retrained
to deal with
to deal with
minor changes in the operating environmental
minor changes in the operating environmental
conditions
conditions
.
.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
Moreover, when it is operating in a
Moreover, when it is operating in a
nonstationary
nonstationary
environment (i.e., one whose statistics change
environment (i.e., one whose statistics change
with time), a neural network can be designed to
with time), a neural network can be designed to
change its synaptic weights in real time. The
change its synaptic weights in real time. The
natural architecture of a neural network for
natural architecture of a neural network for
pattern classification, signal processing, and
pattern classification, signal processing, and
control applications, coupled with the adaptive
control applications, coupled with the adaptive
capability of the network, make it an ideal tool for
capability of the network, make it an ideal tool for
use in adaptive pattern
use in adaptive pattern
classification, adaptive
classification, adaptive
signal processing, and adaptive control.
signal processing, and adaptive control.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
As a general rule, it may be said that the more
As a general rule, it may be said that the more
adaptive we make a system in a properly
adaptive we make a system in a properly
designed fashion, assuming the adaptive system
designed fashion, assuming the adaptive system
is stable, the more robust its performance will
is stable, the more robust its performance will
likely be when the system is required to operate
likely be when the system is required to operate
in a nonstationary environment.
in a nonstationary environment.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
It should be emphasized, however, that
It should be emphasized, however, that
adaptivity
adaptivity
does not always lead to
does not always lead to
robustness; indeed, it may do the very
robustness; indeed, it may do the very
opposite. For example, an adaptive system
opposite. For example, an adaptive system
with short time constants may change
with short time constants may change
rapidly and therefore tend to respond to
rapidly and therefore tend to respond to
spurious disturbances, causing a drastic
spurious disturbances, causing a drastic
degradation in system performance.
degradation in system performance.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
To realize the full benefits of
To realize the full benefits of
adaptivity
adaptivity
, the
, the
principal time constants of the system should be
principal time constants of the system should be
long enough for the system to ignore spurious (
long enough for the system to ignore spurious (
L.
L.
spurius
spurius
, false) disturbances and yet short enough
, false) disturbances and yet short enough
to respond to meaningful changes in the
to respond to meaningful changes in the
environment; the problem described here is
environment; the problem described here is
referred to as the
referred to as the
stability
stability


plasticity dilema
plasticity dilema
(Grossberg, 1988). Adaptivity (or
(Grossberg, 1988). Adaptivity (or
“
“
in situ
in situ
,(L.)in
,(L.)in
the original situation
the original situation
”
”
training as it is sometimes
training as it is sometimes
referred to) is an open research topic.
referred to) is an open research topic.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
4. Evidential Response
4. Evidential Response
In the context of pattern classification, a neural
In the context of pattern classification, a neural
network can be designed to provide information
network can be designed to provide information
not only about which particular pattern to
not only about which particular pattern to
select
select
,
,
but also about the
but also about the
confidence
confidence
in the decision
in the decision
made. This latter information may be used to
made. This latter information may be used to
reject ambiguous patterns, should they arise, and
reject ambiguous patterns, should they arise, and
thereby improve the classification performance of
thereby improve the classification performance of
the network.
the network.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
5. Contextual Information
5. Contextual Information
(
(
L. contextus,contexere
L. contextus,contexere


con
con


, texere, textum,
, texere, textum,
to weave)
to weave)
Knowledge is represented by the very structure and
Knowledge is represented by the very structure and
activation state of a neural network.
activation state of a neural network.
Every neuron in the network is potentially affected by the
Every neuron in the network is potentially affected by the
global activity of all other neurons in the network.
global activity of all other neurons in the network.
Consequently, contextual information is dealt with
Consequently, contextual information is dealt with
naturally by a neural network.
naturally by a neural network.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
6. Fault Tolerance
6. Fault Tolerance
A neural network, implemented in hardware form, has
A neural network, implemented in hardware form, has
the potential to be inherently
the potential to be inherently
fault tolerant
fault tolerant
in the sense
in the sense
that its performance is
that its performance is
degraded gracefully
degraded gracefully
under adverse
under adverse
operating conditions (Bolt, 1992).
operating conditions (Bolt, 1992).
For example, if a neuron or its connecting links are
For example, if a neuron or its connecting links are
damaged, recall of a stored pattern is impaired in quality.
damaged, recall of a stored pattern is impaired in quality.
However, owing to the distributed nature of information
However, owing to the distributed nature of information
in the network, the damage has to be extensive before
in the network, the damage has to be extensive before
the overall response of the network is degraded seriously.
the overall response of the network is degraded seriously.
Thus, in principle, a neural network exhibits a graceful
Thus, in principle, a neural network exhibits a graceful
degradation in performance rather than
degradation in performance rather than
catastrophic
catastrophic
failure
failure
.
.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
7.
7.
VLSI Implementability
VLSI Implementability
The massively parallel nature of a neural network makes it The massively parallel nature of a neural network makes it
potentially fast for the computation of certain tasks. This same potentially fast for the computation of certain tasks. This same
feature makes a neural network ideally suited for implementation feature makes a neural network ideally suited for implementation
using using very very Iarge Iarge scale scale integrated integrated (VLSI) (VLSI) technology. technology.
The particular virtue of VLSI is that it provides a means of The particular virtue of VLSI is that it provides a means of
capturing truly complex behavior in a highly hierarchical fashio capturing truly complex behavior in a highly hierarchical fashion n
(Mead and Conway, 1980), which makes it possible to use a neural (Mead and Conway, 1980), which makes it possible to use a neural
network as a tool for real network as a tool for real time applications involving pattern time applications involving pattern
recognition, signal processing, and control. recognition, signal processing, and control.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
8.
8.
Uniformity of Analysis and Design
Uniformity of Analysis and Design
.
.
Basically, neural
Basically, neural
networks enjoy
networks enjoy
universality
universality
as information processors.
as information processors.
We say this in the sense that the same notation is used in
We say this in the sense that the same notation is used in
all the domains involving the application of neural
all the domains involving the application of neural
networks. This feature manifests itself in different ways:
networks. This feature manifests itself in different ways:
Neurons, in one form or another, represent an ingredient
Neurons, in one form or another, represent an ingredient
common
common
to all neural networks.
to all neural networks.
This commonality makes it possible to
This commonality makes it possible to
share
share
theories
theories
and learning algorithms in different applications of
and learning algorithms in different applications of
neural networks.
neural networks.
Modular networks can be built through a
Modular networks can be built through a
seamless
seamless
integration of modules.
integration of modules.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
a analysis nalysis: : [Medieval Latin, from Greek analusis, [Medieval Latin, from Greek analusis, a dissolving a dissolving, from anal , from analū ūein, ein, to to
undo undo : ana : ana , , throughout throughout; see ; see ana ana  + l + lū ūein, ein, to loosen to loosen; see leu ; see leu  in Indo in Indo European European
roots.] roots.]
( (Download Now Download Now or or Buy the Book Buy the Book) ) The American Heritage The American Heritage® ® Dictionary of the English Dictionary of the English
Language, Fourth Edition Language, Fourth Edition
Copyright Copyright © © 2006 by Houghton Mifflin Company. 2006 by Houghton Mifflin Company.
Published by Houghton Mifflin Company. All rights reserved. Published by Houghton Mifflin Company. All rights reserved.Online Etymology Online Etymology
Dictionary Dictionary   Cite This Source Cite This Source   Share This Share This
analysis analysis
1581, "resolution of anything complex into simple elements" (opp 1581, "resolution of anything complex into simple elements" (opposite of osite of
synthesis synthesis), from M.L. ), from M.L. analysis, analysis, from Gk. from Gk. analysis analysis "a breaking up," from "a breaking up," from analyein analyein
"unloose," from "unloose," from ana ana  "up, throughout" + "up, throughout" + lysis lysis "a loosening" (see "a loosening" (see lose lose). ).
Psychological sense is from 1890. Phrase Psychological sense is from 1890. Phrase in the final (or last) analysis in the final (or last) analysis (1844), (1844),
translates Fr. translates Fr. en derni en derniè ère analyse. re analyse.
Properties and Capabilities of Neural
Properties and Capabilities of Neural
Networks
Networks
Design
Design
:
:
1548, from L. designare "mark out,
1548, from L. designare "mark out,
devise," from de
devise," from de


"out" + signare "to mark," from
"out" + signare "to mark," from
signum "a mark, sign." Originally in Eng. with the
signum "a mark, sign." Originally in Eng. with the
meaning now attached to designate (1646, from L.
meaning now attached to designate (1646, from L.
designatus, pp. of designare); many modern uses
designatus, pp. of designare); many modern uses
of design are metaphoric extensions. Designer
of design are metaphoric extensions. Designer
(adj.) in the fashion sense of "prestigious" is first
(adj.) in the fashion sense of "prestigious" is first
recorded 1966; designer drug is from 1983.
recorded 1966; designer drug is from 1983.
Designing "scheming" is from 1671. Designated
Designing "scheming" is from 1671. Designated
hitter introduced in American League baseball in
hitter introduced in American League baseball in
1973, soon giving wide figurative extension to
1973, soon giving wide figurative extension to
designated.
designated.
9. Neurobiological Analogy
9. Neurobiological Analogy
The design of a neural network is motivated by
The design of a neural network is motivated by
analogy with the brain, which is a living proof
analogy with the brain, which is a living proof
that fault
that fault


tolerant parallel processing is not only
tolerant parallel processing is not only
physically possible but also fast and powerful.
physically possible but also fast and powerful.
Neurobiologists look to (artificial) neural
Neurobiologists look to (artificial) neural
networks as a research tool for the interpretation
networks as a research tool for the interpretation
of neurobiological phenomena.
of neurobiological phenomena.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
For example, neural networks have been used to provide insight For example, neural networks have been used to provide insight
on the development of premotor on the development of premotor ( (relating to, or being the area of relating to, or being the area of
the cortex of the frontal lobe lying immediately in front of the the cortex of the frontal lobe lying immediately in front of the
motor area of the precentral gyrus motor area of the precentral gyrus( (Any of the prominent, rounded, Any of the prominent, rounded,
elevated convolutions on the surfaces of the cerebral hemisphere elevated convolutions on the surfaces of the cerebral hemispheres. s.
[Latin g [Latin gȳ ȳrus, rus, circle circle; see gyre.] ; see gyre.] ) ) ) )circuits in the oculomotor circuits in the oculomotor
(1. (1.Of or relating to movements of the eyeball: Of or relating to movements of the eyeball: an oculomotor an oculomotor
muscle. muscle.
2. 2.Of or relating to the oculomotor nerve. Of or relating to the oculomotor nerve.
[Latin oculus, [Latin oculus, eye eye; see okw ; see okw  in Indo in Indo European roots + motor.] European roots + motor.]
system (responsible for eye movements) and the manner in which system (responsible for eye movements) and the manner in which
they process signals (Robinson, 1992). On the other hand, they process signals (Robinson, 1992). On the other hand,
engineers look to neurobiology for new ideas to solve problems engineers look to neurobiology for new ideas to solve problems
more complex than those based on conventional hard more complex than those based on conventional hard wired wired
design techniques. design techniques.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Here, for example, we may mention the development of
Here, for example, we may mention the development of
a model sonar receiver based on the bat (Simmons et aI.,
a model sonar receiver based on the bat (Simmons et aI.,
1992). The bat
1992). The bat
inspired model consists of three stages:
inspired model consists of three stages:
(1) a front end that mimics the inner ear of the bat in
(1) a front end that mimics the inner ear of the bat in
order to encode waveforms;
order to encode waveforms;
(2) a subsystem of delay lines that computes echo delays;
(2) a subsystem of delay lines that computes echo delays;
(3) a subsystem that computes the spectrum of echoes,
(3) a subsystem that computes the spectrum of echoes,
which is in turn used to estimate the time separation of
which is in turn used to estimate the time separation of
echoes from multiple target glints.
echoes from multiple target glints.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
The motivation is to develop a new sonar receiver
The motivation is to develop a new sonar receiver
that is superior to one designed by conventional
that is superior to one designed by conventional
methods. The neurobiological analogy is also
methods. The neurobiological analogy is also
useful in another important way: It provides a
useful in another important way: It provides a
hope and be
hope and be
l
l
ief (
ief (
an
an
d, to a certain extent, an
d, to a certain extent, an
existence proof) that physical understanding of
existence proof) that physical understanding of
neurob
neurob
io
io
log
log
i
i
cal structures could indeed influence
cal structures could indeed influence
the art of electronics and thus VLSI (Andreou,
the art of electronics and thus VLSI (Andreou,
1992).
1992).
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
With inspiration from neurobiological analogy in
With inspiration from neurobiological analogy in
mind, it seems appropriate that we take a brief
mind, it seems appropriate that we take a brief
look at the structural levels of organization in the
look at the structural levels of organization in the
brain
brain
.
.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
1.2 Structural Levels of Organization in the
1.2 Structural Levels of Organization in the
Brain
Brain
The human nervous system may be viewed
The human nervous system may be viewed
as a three
as a three


stage system,(Arbib, 1987).
stage system,(Arbib, 1987).
Blockdiagram representation of nervous system
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Central to the system is the
Central to the system is the
brain
brain
,
,
represented
represented
b
b
y the
y the
neural (ner
neural (ner
v
v
e) net
e) net
in
in
this figure
this figure
, which continually
, which continually
receives information, perceives
receives information, perceives
i
i
t, and makes
t, and makes
appropriate decisions. Two sets of arrows are shown in
appropriate decisions. Two sets of arrows are shown in
this figure:
this figure:
1
1
Those pointing from left to right indicate the
Those pointing from left to right indicate the
forward
forward
transmission of information
transmission of information


bearing signals through
bearing signals through
the system.
the system.
2
2
T
T
he arrows pointing from right to left signify the
he arrows pointing from right to left signify the
presence of
presence of
feedback
feedback
in the system.
in the system.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
The
The
receptors
receptors
in
in
the figure
the figure
convert stimuli from
convert stimuli from
the human body or the external environment into
the human body or the external environment into
electrical impulses that convey information to the
electrical impulses that convey information to the
neural net (brain). The
neural net (brain). The
effectors
effectors
,
,
on the other
on the other
hand, convert electrical impulses generated by the
hand, convert electrical impulses generated by the
neural net into discernible responses as system
neural net into discernible responses as system
outputs. . In the brai
outputs. . In the brai
n
n
there are both small
there are both small


scale
scale
and large
and large


scale anatomical organizations, and
scale anatomical organizations, and
di
di
fferent funct
fferent funct
io
io
ns take place at lower and higher
ns take place at lower and higher
levels.
levels.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
This figure
This figure
shows a
shows a
hierarchy of
hierarchy of
in
in
terwoven
terwoven
levels of organization that
levels of organization that
has emerged from the
has emerged from the
extensive work done on
extensive work done on
the analysis of local
the analysis of local
regions in the brain
regions in the brain
(Churchland and
(Churchland and
Sejnowski, 1992; Shepherd
Sejnowski, 1992; Shepherd
and Koch, 1990).
and Koch, 1990).
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Proceeding upward from
Proceeding upward from
synapses
synapses
that represent
that represent
the most fundamental level and that depend on
the most fundamental level and that depend on
molecules and ions for their action, we have
molecules and ions for their action, we have
neural microcircuits
neural microcircuits
dendritic trees
dendritic trees
, and then
, and then
neurons.
neurons.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
A
A
neural microcircuit
neural microcircuit
refers to an assembly
refers to an assembly
of synapse
of synapse
s
s
orga
orga
ni
ni
zed
zed
in
in
to patterns of
to patterns of
connectivity so as to produce a functional
connectivity so as to produce a functional
operation of interest. A n
operation of interest. A n
e
e
ural microcircuit
ural microcircuit
may be likened to a silicon chip made up of
may be likened to a silicon chip made up of
an assembly of trans
an assembly of trans
i
i
stors.
stors.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
The smallest size of microcircuits is measured in
The smallest size of microcircuits is measured in
micrometers
micrometers
(
(
µ
µ
m
m
), and their fastest speed of
), and their fastest speed of
operation is measured in milliseconds. The neural
operation is measured in milliseconds. The neural
microcircuits are grouped to form
microcircuits are grouped to form
dendritic
dendritic
subunits
subunits
within the
within the
dendritic
dendritic
trees
trees
of individual
of individual
neurons. The whole
neurons. The whole
neuron
neuron
,
,
about 100
about 100
µ
µ
m in size,
m in size,
contains several dendritic subunits.
contains several dendritic subunits.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
At the next level of complexity, we have
At the next level of complexity, we have
local
local
circuits
circuits
(about 1 mm in size) made up of
(about 1 mm in size) made up of
neurons with similar or different properties;
neurons with similar or different properties;
these neural assemblies perform operations
these neural assemblies perform operations
characteristic of a localized region in the
characteristic of a localized region in the
brain.
brain.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
This is followed by
This is followed by
interregional circuits
interregional circuits
made up
made up
of pathways, columns, and topographic maps,
of pathways, columns, and topographic maps,
which involve multiple regions located in different
which involve multiple regions located in different
parts of the brain.
parts of the brain.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Topographic maps
Topographic maps
are organized to respond to
are organized to respond to
incoming sensory information. These maps are
incoming sensory information. These maps are
often arranged in sheets, as in the superior
often arranged in sheets, as in the superior
colliculus, where the
colliculus, where the
visual, auditory, and
visual, auditory, and
somatosensory
somatosensory
maps are stacked in adjacent
maps are stacked in adjacent
layers in such a way that stimuli from
layers in such a way that stimuli from
corresponding points in space lie above each
corresponding points in space lie above each
other. Finally, the topographic maps, and other
other. Finally, the topographic maps, and other
interregional circuits mediate specific types of
interregional circuits mediate specific types of
behavior in the
behavior in the
central nervous system
central nervous system
.
.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
It is important to recognize that the
It is important to recognize that the
structural levels of organization described
structural levels of organization described
herein are a unique characteristic of the
herein are a unique characteristic of the
brain.
brain.
They are nowhere to be found in a
They are nowhere to be found in a
digital computer, and we are nowhere close
digital computer, and we are nowhere close
to realizing them with artificial neural
to realizing them with artificial neural
networks.
networks.
Nevertheless, we are inching our
Nevertheless, we are inching our
way toward a hierarchy of computational
way toward a hierarchy of computational
levels similar to that described in the
levels similar to that described in the
last
last
figure.
figure.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
The artificial neurons we use to build our
neural networks are truly primitive in
comparison to those found in the brain.
The neural networks we are presently able to
design are just as primitive compared to the
local circuits and the interregional circuits in
the brain.
What is really satisfying, however, is the
What is really satisfying, however, is the
remarkable progress that we have made on
remarkable progress that we have made on
so many fronts during the past 20 years.
so many fronts during the past 20 years.
With the neurobiological analogy as the
With the neurobiological analogy as the
source of inspiration, and the wealth of
source of inspiration, and the wealth of
theoretical and technological tools that we
theoretical and technological tools that we
are bringing together, it is for certain that in
are bringing together, it is for certain that in
another 10 years our understanding of
another 10 years our understanding of
artificial neural networks will be much more
artificial neural networks will be much more
sophisticated than it is today.
sophisticated than it is today.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Our primary interest here is confined to the study
Our primary interest here is confined to the study
of artificial neural networks from an engineering
of artificial neural networks from an engineering
perspective, to which
perspective, to which
we refer simply as neural
we refer simply as neural
networks
networks
. We begin the study by describing the
. We begin the study by describing the
models of (artificial) neurons
models of (artificial) neurons
that form the basis of the neural networks
that form the basis of the neural networks
considered in these lectures.
considered in these lectures.
Properties and Capabilities of
Properties and Capabilities of
Neural Networks
Neural Networks
Models of a Neuron
Models of a Neuron
Models of a Neuron
Models of a Neuron
A
A
neuron
neuron
is an information
is an information


processing unit that is
processing unit that is
fundamental to the operation of a neural network.
fundamental to the operation of a neural network.
The figure on the next slide shows the
The figure on the next slide shows the
model
model
for a
for a
neuron.
neuron.
Nonlinear model of a neuron
Models of a Neuron
Models of a Neuron
1.
1.
A set of
A set of
synapses
synapses
or
or
connecting links
connecting links
,
,
each
each
of which is characterized by a
of which is characterized by a
weight
weight
or
or
strength
strength
of its own. Specifically, a signal
of its own. Specifically, a signal
x
x
j j
at the input of synapse
at the input of synapse
j
j
connected to
connected to
neuron
neuron
k
k
is multiplied by the synaptic
is multiplied by the synaptic
weight
weight
w
w
kj kj
.
.
It is important to make a note
It is important to make a note
of the manner in which the subscripts of
of the manner in which the subscripts of
the synaptic weight
the synaptic weight
w
w
kj kj
are written.
are written.
Models of a Neuron
Models of a Neuron
The first subscript refers to the neuron in
The first subscript refers to the neuron in
question and the second subscript refers to
question and the second subscript refers to
the input end of the synapse to which the
the input end of the synapse to which the
weight refers; the reverse of this notation is
weight refers; the reverse of this notation is
also used in the literature.
also used in the literature.
The weight
The weight
w
w
kj kj
is
is
positive
positive
if the associated
if the associated
synapse is
synapse is
excitatory
excitatory
; it is
; it is
negative
negative
if the
if the
synapse is
synapse is
inhibitory
inhibitory
(
(
Middle English
Middle English
inhibiten, to forbid, from Latin inhib
inhibiten, to forbid, from Latin inhib
ē
ē
re, inhibit
re, inhibit


, to restrain, forbid : in
, to restrain, forbid : in


, in; see in
, in; see in


2 + hab
2 + hab
ē
ē
re, to
re, to
hold; see ghabh
hold; see ghabh


in Indo
in Indo


European roots.]
European roots.]
)
)
Models of a Neuron
Models of a Neuron
Models of a Neuron
Models of a Neuron
2.
2.
An
An
adder
adder
for summing the input signals,
for summing the input signals,
weighted by the respective synapses of
weighted by the respective synapses of
the neuron; the operations described
the neuron; the operations described
here constitute a
here constitute a
linear combiner.
linear combiner.
3.
3.
An
An
activation function
activation function
for limiting the amplitude
for limiting the amplitude
of the output of a neuron. The activation function,
of the output of a neuron. The activation function,
is also referred to in the literature as a
is also referred to in the literature as a
squashing
squashing
function
function
in that it squashes (limits) the permissible
in that it squashes (limits) the permissible
amplitude range of the output signal to some finite
amplitude range of the output signal to some finite
value. Typically, the normalized amplitude range
value. Typically, the normalized amplitude range
of the output of a neuron is written as the closed
of the output of a neuron is written as the closed
unit interval [0,1] or alternatively [
unit interval [0,1] or alternatively [


1,1].
1,1].
Models of a Neuron
Models of a Neuron
4.
4.
The model of a neuron also includes an
The model of a neuron also includes an
externally applied
externally applied
threshold
threshold
u
u
k k
that has the
that has the
effect of lowering the net input of the
effect of lowering the net input of the
activation function.
activation function.
On the other hand, the net input of the
On the other hand, the net input of the
activation function may be increased by
activation function may be increased by
employing a
employing a
bias
bias
term rather than a
term rather than a
threshold; the bias is the negative of the
threshold; the bias is the negative of the
threshold
threshold. .
Models of a Neuron
Models of a Neuron
In mathematical terms, we may describe neuron
In mathematical terms, we may describe neuron
k
k
by
by
writing the following pair of equations:
writing the following pair of equations:
Models of a Neuron
Models of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u ¢ ÷ =
=
¿
=1
where
x
x
j j
’
’
s
s are the input signals;
w
w
kj kj
’s are the synaptic
weights of neuron k; u
k
is the linear combiner output;
u
k
is the threshold; (.) is the activation function;
and y
k
is the output signal of the neuron.
Mathematical Model
Mathematical Model
of a Neuron
of a Neuron
Models of a Neuron
Models of a Neuron
Block
Block


Diagram Representation of a Neuron
Diagram Representation of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u ¢ ÷ =
=
¿
=1
¬
Models of a Neuron
Models of a Neuron
The use of
The use of
threshold
threshold
k k
has the effect of
has the effect of
applying an
applying an
affine transformation
affine transformation
to the
to the
output
output
u
u
k k
of the linear combiner in the
of the linear combiner in the
model of the figure, as shown by
model of the figure, as shown by
k k k
u v u ÷ =
Models of a Neuron
Models of a Neuron
In particular, depending on
whether the
threshold
threshold
u
u
k k
is
positive or negative, the
relationship between the
effective internal activity level or
activation potential v
k
of neuron
k and the linear combiner
output u
k
is modified in the
manner illustrated in the figure.
Note that as a result of this
affine transformation, the graph
of v
k
versus u
k
no longer passes
through the origin.
The
The
threshold
threshold
k
k
is an external parameter
is an external parameter
of artificial neuron
of artificial neuron
k.
k.
We may account for its
We may account for its
presence as in the above equation.
presence as in the above equation.
Equivalently, we may formulate the
Equivalently, we may formulate the
combination of the two equations as follows:
combination of the two equations as follows:
Models of a Neuron
Models of a Neuron
0
( )
p
k kj j
j
k k
v w x
y v ¢
=
=
=
¿
Here we have added a new synapse, whose input
Here we have added a new synapse, whose input
is
is
Models of a Neuron
Models of a Neuron
1
0
÷ = x
and whose weight is
k k
w u =
0
We may therefore
We may therefore
reformulate the model
reformulate the model
of neuron
of neuron
k
k
as in the
as in the
figure, where the effect
figure, where the effect
of the threshold is
of the threshold is
represented by doing
represented by doing
two things:
two things:
Models of a Neuron
Models of a Neuron
(1) adding a new input signal fixed at
(1) adding a new input signal fixed at


1, and
1, and
(2) adding a new synaptic weight equal to the
(2) adding a new synaptic weight equal to the
threshold
threshold
k
k
.
.
Alternatively, we may model the neuron as in
Alternatively, we may model the neuron as in
the following slide:
the following slide:
Models of a Neuron
Models of a Neuron
Models of a Neuron
Models of a Neuron
where the
where the
combination of
combination of
fixed input
fixed input
xo
xo
= + 1 and
= + 1 and
weight
weight
w
w
kO kO
=
=
b
b
k k
accounts for the
accounts for the
bias
bias
b
b
k k
•
•
Although the
Although the
models of the
models of the
two figures are
two figures are
different in
different in
appearance,
appearance,
they are
they are
mathematically
mathematically
equivalent.
equivalent.
Y1
Sl ayt 92
Y1
YTU; 15.03.2005
Models of a Neuron
Models of a Neuron
Signal
Signal


Flow Graph Representation of a Neuron
Flow Graph Representation of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u ¢ ÷ =
=
¿
=1 ¬
Models of a Neuron
Models of a Neuron
Signal
Signal


Flow Graph Representation of a Neuron
Flow Graph Representation of a Neuron
Two different types of links may be distinguished:
Two different types of links may be distinguished:
(a)
(a)
Synaptic links
Synaptic links
,
,
defined by a
defined by a
linear
linear
input
input


output
output
relation. Specifically, the node signal
relation. Specifically, the node signal
x
x
j j
is
is
multiplied by the synaptic weight
multiplied by the synaptic weight
w
w
kj kj
to produce
to produce
the node signal
the node signal
v
v
k k
.
.
(b)
(b)
Activation links
Activation links
,
,
defined in general by a
defined in general by a
nonlinear
nonlinear
input
input


output relation. This form of
output relation. This form of
relationship is the nonlinear activation function
relationship is the nonlinear activation function
given as
given as
(.) ¢
Models of a Neuron
Models of a Neuron
The Activation Function
The Activation Function
The activation function, denoted by
The activation function, denoted by
defines the output
defines the output
y
y
of a neuron in terms of the
of a neuron in terms of the
activity level at its input
activity level at its input
v
v
.
.
( ) y v ¢ =
Models of a Neuron
Models of a Neuron
We may identify three basic types of activation
We may identify three basic types of activation
functions:
functions:
1. 1.
Threshold Function
Threshold Function
2. 2.
Piecewise
Piecewise


linear Function
linear Function
3. 3.
Sigmoid Function
Sigmoid Function
Models of a Neuron
Models of a Neuron
1
( ) v ¢
v
0
(a) Unipolar
(a) Unipolar
1. 1.
Threshold (hard limiter or binary activation )
Threshold (hard limiter or binary activation )
Function (leading to discrete perceptron)
Function (leading to discrete perceptron)
) sgn(
2
1
2
1
) ( v v + = ¢
Models of a Neuron
Models of a Neuron
1
( ) v ¢
v
0
0
0
1
(b)
(b)
Bipolar
Bipolar
( ) sgn( ) v v ¢ =
Models of a Neuron
Models of a Neuron
1
( ) v ¢
v
0
0
0
0.5
0.5
(a) Unipolar
(a) Unipolar
2. Piecewise
2. Piecewise


linear Function
linear Function
1 1 1 1
( )
2 2 2 2
v v v ¢
 
= + + ÷ ÷

\ .
Models of a Neuron
Models of a Neuron
( ) 1 ) ( 1 ) (
2
1
) ( ) ( ÷ ÷ + = = t x t x x f t y
ij ij ij ij
1
( ) v ¢
v
0
1
(b) Bipolar
(b) Bipolar
1
1
( )
1
( ) 1 1
2
v v v ¢ = + ÷ ÷
Models of a Neuron
Models of a Neuron
(a) Unipolar
(a) Unipolar
( ) v ¢
v
1
0.5
3. Sigmoid Function
3. Sigmoid Function
0 ;
1
1
) ( >
+
=
÷
a
e
v
av
¢
Models of a Neuron
Models of a Neuron
( ) v ¢
v
1
1
(b) Bipolar
(b) Bipolar
1 2
( ) = 1 ; 0
1 1
av
av av
e
v a
e e
¢
÷
÷ ÷
÷
= >
+ +
Models of Artificial Neural Networks
Models of Artificial Neural Networks
DEFINITION OF Neural Network
DEFINITION OF Neural Network
(Jacek M. Zurada, ARTIFICIAL NEURAL SYSTEMS, 1992, West Publishi (Jacek M. Zurada, ARTIFICIAL NEURAL SYSTEMS, 1992, West Publishing Company) ng Company)
A Neural Network is
A Neural Network is
an interconnection of
an interconnection of
neurons such that neuron outputs are
neurons such that neuron outputs are
connected, through weights, to all other
connected, through weights, to all other
neurons including themselves; both lagfree
neurons including themselves; both lagfree
and delay connections are allowed.
and delay connections are allowed.
Models of Artificial Neural Networks
Models of Artificial Neural Networks
Neural Networks Viewed as Directed Graphs
Neural Networks Viewed as Directed Graphs
1. 1.
Block
Block


Diagram Representation (BDR)
Diagram Representation (BDR)
2. 2.
Signal
Signal


Flow Graph Representation (SFGR)
Flow Graph Representation (SFGR)
These are obtained when BDR and SFGR for
These are obtained when BDR and SFGR for
the neurons are used.
the neurons are used.
Models of Artificial Neural Networks
Models of Artificial Neural Networks
An alternative definition of Neural Network
An alternative definition of Neural Network
(Simon Haykin, NEURAL NETWORKS, 1994, Macmillan College Publishi (Simon Haykin, NEURAL NETWORKS, 1994, Macmillan College Publishing Company) ng Company)
A neural network is a directed graph (SFG) consisting of
A neural network is a directed graph (SFG) consisting of
nodes with interconnecting synaptic and activation
nodes with interconnecting synaptic and activation
links, and which is characterized by four properties:
links, and which is characterized by four properties:
Each neuron is represented by a set of linear synaptic
Each neuron is represented by a set of linear synaptic
links, an externally applied threshold, and a nonlinear
links, an externally applied threshold, and a nonlinear
activation link. The threshold is represented by a
activation link. The threshold is represented by a
synaptic link with an input signal fixed at a value of
synaptic link with an input signal fixed at a value of


1.
1.
Models of Artificial Neural Networks
Models of Artificial Neural Networks
2.
2.
The synaptic links of a neuron weight their
The synaptic links of a neuron weight their
respective input signals.
respective input signals.
3.
3.
The weighted sum of the input signals
The weighted sum of the input signals
defines the total internal activity level of
defines the total internal activity level of
the neuron in question.
the neuron in question.
4.
4.
The activation link squashes the internal
The activation link squashes the internal
activity level of the neuron to produce an
activity level of the neuron to produce an
output that represents the output of the
output that represents the output of the
neuron.
neuron.
Network Architectures
Network Architectures
In general, we may identify four different
In general, we may identify four different
classes of network architectures:
classes of network architectures:
1.
1.
Single
Single


Layer Feedforward Networks
Layer Feedforward Networks
2.
2.
Multilayer Feedforward Networks
Multilayer Feedforward Networks
3.
3.
Recurrent Networks
Recurrent Networks
4.
4.
Lattice Structures
Lattice Structures
1.
1.
Single
Single


Layer Feedforward Networks
Layer Feedforward Networks
A
A
layered
layered
neural network is a network of
neural network is a network of
neurons organized in the form of layers. In
neurons organized in the form of layers. In
the simplest form of a layered network, we
the simplest form of a layered network, we
just have an
just have an
input layer
input layer
of source nodes that
of source nodes that
projects onto an
projects onto an
output layer
output layer
of neurons
of neurons
(computation nodes), but not vice versa.
(computation nodes), but not vice versa.
Network Architectures
Network Architectures
In other words, this network is strictly of a
In other words, this network is strictly of a
feedforward
feedforward
type. It is illustrated on the
type. It is illustrated on the
following slide for the case of four nodes in
following slide for the case of four nodes in
both the input and output layers. Such a
both the input and output layers. Such a
network is called a
network is called a
single
single


layer network
layer network
,
,
with the designation "
with the designation "
single layer
single layer
" referring
" referring
to the output layer of computation nodes
to the output layer of computation nodes
(neurons). In other words, we do not count
(neurons). In other words, we do not count
the input layer of source nodes, because no
the input layer of source nodes, because no
computation is performed there.
computation is performed there.
Network Architectures
Network Architectures
Network Architectures
Network Architectures
2. Multilayer Feedforward Networks
2. Multilayer Feedforward Networks
The second class of a feedforward neural
The second class of a feedforward neural
network distinguishes itself by the presence
network distinguishes itself by the presence
of one or more
of one or more
hidden layers
hidden layers
,
,
whose
whose
computation nodes are correspondingly
computation nodes are correspondingly
called
called
hidden
hidden
neurons
neurons
or
or
hidden units
hidden units
.
.
The
The
function of the hidden neurons is to
function of the hidden neurons is to
intervene between the external input and the
intervene between the external input and the
network output.
network output.
Network Architectures
Network Architectures
Network Architectures
Network Architectures
By adding one or more hidden layers, the network is
By adding one or more hidden layers, the network is
enabled to extract higher
enabled to extract higher


order statistics, for (in a rather
order statistics, for (in a rather
loose sense) the network acquires a
loose sense) the network acquires a
global
global
perspective
perspective
despite its local connectivity by virtue of:
despite its local connectivity by virtue of:
the extra set of synaptic connections
the extra set of synaptic connections
the extra dimension of neural interactions
the extra dimension of neural interactions
.
.
The ability of hidden neurons to extract higher
The ability of hidden neurons to extract higher


order
order
statistics is particularly valuable when the size of the
statistics is particularly valuable when the size of the
input layer is large.
input layer is large.
Network Architectures
Network Architectures
The source nodes in the input layer of the
The source nodes in the input layer of the
network supply respective elements of the
network supply respective elements of the
activation pattern (input vector), which constitute
activation pattern (input vector), which constitute
the input signals applied to the neurons
the input signals applied to the neurons
(computation nodes) in the second layer (i.e., the
(computation nodes) in the second layer (i.e., the
first hidden layer). The output signals of the
first hidden layer). The output signals of the
second layer are used as inputs to the third layer,
second layer are used as inputs to the third layer,
and so on for the rest of the network.
and so on for the rest of the network.
Network Architectures
Network Architectures
Network Architectures
Network Architectures
Typically, the neurons in each layer of the
Typically, the neurons in each layer of the
network have as their inputs the output signals of
network have as their inputs the output signals of
the preceding layer only.
the preceding layer only.
The set of output signals of the neurons in the
The set of output signals of the neurons in the
output (final) layer of the network constitutes the
output (final) layer of the network constitutes the
overall response of the network to the activation
overall response of the network to the activation
pattern supplied by the source nodes in the input
pattern supplied by the source nodes in the input
(first) layer.
(first) layer.
Network Architectures
Network Architectures
This graph illustrates the
This graph illustrates the
layout of a multilayer
layout of a multilayer
feedforward neural
feedforward neural
network for the case of a
network for the case of a
single hidden layer. For
single hidden layer. For
brevity this network is
brevity this network is
referred to as a 10
referred to as a 10


4
4


2
2
network in that it has 10
network in that it has 10
source nodes, 4 hidden
source nodes, 4 hidden
neurons, and 2 output
neurons, and 2 output
neurons.
neurons.
Network Architectures
Network Architectures
As another example, a feedforward
As another example, a feedforward
network with
network with
p
p
source nodes,
source nodes,
h
h
1
1
neurons in the first hidden layer,
neurons in the first hidden layer,
h
h
2
2
neurons in the second layer, and
neurons in the second layer, and
q
q
neurons in the output layer, say, is
neurons in the output layer, say, is
referred to as a
referred to as a
p
p


h
h
1
1


h
h
2
2


q
q
network.
network.
Network Architectures
Network Architectures
The neural network of
The neural network of
this figure is said to
this figure is said to
be
be
fully connected
fully connected
in
in
the sense that every
the sense that every
node in each layer of
node in each layer of
the network is
the network is
connected to every
connected to every
other node in the
other node in the
adjacent forward
adjacent forward
layer
layer
.
.
If, however, some of the communication links
If, however, some of the communication links
(synaptic connections) are missing from the
(synaptic connections) are missing from the
network, we say that the network is
network, we say that the network is
partially
partially
connected.
connected.
A form of partially connected
A form of partially connected
multilayer feedforward network of particular
multilayer feedforward network of particular
interest is a locally connected network. An
interest is a locally connected network. An
example of such a network with a single hidden
example of such a network with a single hidden
layer is presented on the next slide. Each neuron
layer is presented on the next slide. Each neuron
in the hidden layer is connected to a local (partial)
in the hidden layer is connected to a local (partial)
set of source nodes that lies in the immediate
set of source nodes that lies in the immediate
neighborhood.
neighborhood.
Network Architectures
Network Architectures
Network Architectures
Network Architectures
Partially connected
feedforward neural network
Such a set of localized
nodes feeding a neuron
is said to constitute the
receptive field of the
neuron.
Likewise, each neuron in
the output layer is
connected to a local set
of hidden neurons.
Network Architectures
Network Architectures
3. Recurrent (Feedback or Dynamical) 3. Recurrent (Feedback or Dynamical)
Networks Networks
A A recurrent neural network recurrent neural network
distinguishes itself from a feedforward distinguishes itself from a feedforward
neural network in that it has at least neural network in that it has at least
one one feedback loop feedback loop. For example, a . For example, a
recurrent network may consist of a recurrent network may consist of a
single layer of neurons with each single layer of neurons with each
neuron feeding its output signal back neuron feeding its output signal back
to the inputs of all the other neurons, to the inputs of all the other neurons,
as illustrated in the architectural as illustrated in the architectural
graph of the figure on the right. In the graph of the figure on the right. In the
structure depicted in this figure there structure depicted in this figure there
are are no self no self feedback feedback loops in the loops in the
network; self network; self feedback refers to a feedback refers to a
situation where the output of a neuron situation where the output of a neuron
is fedback to its own input. is fedback to its own input.
Recurrent network with no
selffeedback loops and no
hidden neurons
Network Architectures
Network Architectures
The recurrent network
The recurrent network
illustrated on the previous
illustrated on the previous
slide also has
slide also has
no hidden
no hidden
neurons
neurons
. Here we illustrate
. Here we illustrate
another class of recurrent
another class of recurrent
networks with hidden
networks with hidden
neurons. The feedback
neurons. The feedback
connections shown originate
connections shown originate
from the hidden neurons as
from the hidden neurons as
well as the output neurons.
well as the output neurons.
Recurrent network with
hidden neurons
The
The
presence of feedback loops
presence of feedback loops
, be it as in the
, be it as in the
recurrent structure with or without hidden
recurrent structure with or without hidden
neurons,
neurons,
has a profound impact on the learning
has a profound impact on the learning
capability
capability
of the network,
of the network,
and
and
on its
on its
performance
performance
. Moreover, the feedback loops
. Moreover, the feedback loops
involve the use of particular branches
involve the use of particular branches
composed of
composed of
unit
unit


delay elements
delay elements
(denoted by
(denoted by
z
z
 1 1
), which result in a nonlinear dynamical
), which result in a nonlinear dynamical
behavior by virtue of the nonlinear nature of
behavior by virtue of the nonlinear nature of
the neurons.
the neurons.
Network Architectures
Network Architectures
4. Lattice (Multicategory Perceptron) 4. Lattice (Multicategory Perceptron)
Structures Structures
A A lattice lattice consists of a one consists of a one dimensional, dimensional,
two two dimensional, or higher dimensional, or higher dimensional dimensional
array of neurons with a corresponding set array of neurons with a corresponding set
of source nodes that supply the input of source nodes that supply the input
signals to the array; the dimension of the signals to the array; the dimension of the
lattice refers to the number of the lattice refers to the number of the
dimensions of the space in which the dimensions of the space in which the
graph lies. graph lies.
A lattice network is really a feedforward A lattice network is really a feedforward
network with the output neurons network with the output neurons
arranged in rows and columns. arranged in rows and columns.
Network Architectures
Network Architectures
One dimensional lattice of 3 neurons
Two dimensional lattice of 3by3 neurons
The
The
perceptron
perceptron
is the simplest
is the simplest
form of a neural network used
form of a neural network used
for the classification of a special
for the classification of a special
type of patterns said to be
type of patterns said to be
linearly separable
linearly separable
(i.e., patterns
(i.e., patterns
that lie on opposite sides of a
that lie on opposite sides of a
hyperplane).
hyperplane).
Basically, it consists of a single
Basically, it consists of a single
neuron with adjustable synaptic
neuron with adjustable synaptic
weights and threshold, as shown
weights and threshold, as shown
in the figures.
in the figures.
The Perceptron
The Perceptron
The algorithm used to adjust the free parameters of this
The algorithm used to adjust the free parameters of this
neural network first appeared in a learning procedure
neural network first appeared in a learning procedure
developed by
developed by
Rosenblatt
Rosenblatt
(1958, 1962) for his
(1958, 1962) for his
perceptron
perceptron
brain model
brain model
. Indeed, Rosenblatt proved that if the
. Indeed, Rosenblatt proved that if the
patterns (vectors) used to train the perceptron are drawn
patterns (vectors) used to train the perceptron are drawn
from
from
two linearly separable classes
two linearly separable classes
, then the perceptron
, then the perceptron
algorithm
algorithm
converges
converges
and positions the decision surface in
and positions the decision surface in
the form of a
the form of a
hyperplane between the two classes
hyperplane between the two classes
. The
. The
proof of convergence of the algorithm is known as the
proof of convergence of the algorithm is known as the
perceptron convergence theorem.
perceptron convergence theorem.
The Perceptron
The Perceptron
The single
The single


layer perceptron depicted
layer perceptron depicted
has a single
has a single
neuron
neuron
. Such a perceptron is limited to performing
. Such a perceptron is limited to performing
pattern classification with only two classes
pattern classification with only two classes
.
.
By expanding the output (computation) layer of the
By expanding the output (computation) layer of the
perceptron to include more than one neuron, we may
perceptron to include more than one neuron, we may
correspondingly form classification with more than two
correspondingly form classification with more than two
classes. However, the classes would have to be linearly
classes. However, the classes would have to be linearly
separable for the perceptron to work properly.
separable for the perceptron to work properly.
The Perceptron
The Perceptron
The Perceptron
The Perceptron
From this model we find that
From this model we find that
the linear combiner output
the linear combiner output
(i.e., hard limiter input) is
(i.e., hard limiter input) is
1
p
kj j
j
v w x u
=
= ÷
¿
The purpose of the perceptron is to classify the set of
externally applied stimuli x
1
,x
2
,…., x
p
into one of two
classes, C
1
or C
2
, say. The decision rule for the
classification is to assign the point represented by the
inputs x
1
,x
2
,…., x
p
to class C
1
, if the perceptron output y is
+ 1 and to class C
2
if it is 1.
The Perceptron
The Perceptron
To develop insight into the behavior of a pattern
To develop insight into the behavior of a pattern
classifier, it is customary to plot a map of the decision
classifier, it is customary to plot a map of the decision
regions in the p
regions in the p


dimensional signal space spanned by
dimensional signal space spanned by
the
the
p
p
input variables
input variables x
1
,x
2
,…., x
p
.
.
In the case of an
In the case of an
elementary perceptron, there are two decision regions
elementary perceptron, there are two decision regions
separated by a
separated by a
hyperplane
hyperplane
defined by
defined by
1
0
p
kj j
j
w x u
=
÷ =
¿
This is illustrated here for the case of This is illustrated here for the case of
two input variables two input variables x x
l l
and and x x
2 2
, , for which for which
the the decision boundary decision boundary takes the form of takes the form of
a straight line called the a straight line called the decision line decision line. A . A
point point (x (x
1 1
,x ,x
2 2
) ) that lies above the that lies above the decision decision
line line is assigned to class C is assigned to class C
1 1
, and a point , and a point
(x (x
1 1
,x ,x
2 2
) ) that lies below the that lies below the decision line decision line
is assigned to class C is assigned to class C
2 2
. Note also that . Note also that
the effect of the threshold the effect of the threshold u u is merely to is merely to
shift the shift the decision line decision line away from the away from the
origin. The synaptic weights origin. The synaptic weights w w
1 1
w w
2 2
, , .., ..,w w
p p
of the perceptron can be fixed or of the perceptron can be fixed or
adapted on an iteration adapted on an iteration by by iteration iteration
basis. For the adaptation, we may use basis. For the adaptation, we may use
an error an error correction rule known as the correction rule known as the
perceptron convergence algorithm. perceptron convergence algorithm.
The Perceptron
The Perceptron
The Perceptron
The Perceptron
We find it more convenient to
We find it more convenient to
work with the modified signal
work with the modified signal


flow graph
flow graph given here. given here.
In this second model, which is
In this second model, which is
equivalent to that of the
equivalent to that of the
previous
previous
figure, the threshold
figure, the threshold
u
u
is treated as
is treated as
a synaptic weight is connected
a synaptic weight is connected
1 2
[ ... ... 1]
t
p
x x x x =
u
1 2
[ ... ... ]
t
p
w w w w u = ÷
to a fixed input equal to
to a fixed input equal to


1.
1.
We may thus define the
We may thus define the
(p
(p
+ 1)
+ 1)


by
by


1
1
(augmented)
(augmented)
input vector and the
input vector and the
corresponding
corresponding
(augmented)
(augmented)
weight vector as:
weight vector as:
The Perceptron
The Perceptron
Pattern Space
Pattern Space
Any pattern can be represented by a point in
Any pattern can be represented by a point in
n
n


dimensinal Euclidean space
dimensinal Euclidean space
E
E
n n
called the
called the
pattern space
pattern space
.
.
Points in that space corresponding to
Points in that space corresponding to
members of the pattern set are n
members of the pattern set are n


tuple vectors
tuple vectors
x.
x.
The Perceptron
The Perceptron
Example 1:
Example 1:
Consider the six patterns in two dimensional
Consider the six patterns in two dimensional
pattern space shown in the following figure.
pattern space shown in the following figure.
x
x
2 2
x
x
1 1
(2,0)
(2,0)
(1.5,
(1.5,


1)
1)
(1,
(1,


2)
2)
(
(


1,
1,


2)
2)
(
(


0.5,
0.5,


1)
1)
(0, 0)
(0, 0)
The Perceptron
The Perceptron
Design a perceptron such that these are classified
Design a perceptron such that these are classified
according to their membership in sets as follows :
according to their membership in sets as follows :
2 1.5 1
, , : class 1
0 1 2
0 0.5 1
, , : class 2
0 1 2
¦ ¹
( ( (
´ `
( ( (
÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
¹ )
¦ ÷ ÷ ¹
( ( (
´ `
( ( (
÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
¹ )
The Perceptron
The Perceptron
x
x
2 2
x
x
1 1
(2,0)
(2,0)
(1.5,
(1.5,


1)
1)
(1,
(1,


2)
2)
(
(


1,
1,


2)
2)
(
(


0.5,
0.5,


1)
1)
(0, 0)
(0, 0)
x
x
2 2
= 2x
= 2x
1 1


2
2
One possible decision line is given by x
One possible decision line is given by x
2 2
= 2x
= 2x
1 1


2
2
which is drawn in the following figure.
which is drawn in the following figure.
The Perceptron
The Perceptron
One decision surface for this line is obtained as: One decision surface for this line is obtained as:
2 2
2 1 3
+ + ÷ = x x x
3 1 2
3 1 2
3 1 2
0 2 2 0 gives the points on the decision line
0 2 2 0 gives the part of the surface above the decision line
0 2 2 0 gives the part of the surface below the decision line
x x x
x x x
x x x
= ¬ ÷ + + =
> ¬ ÷ + + >
< ¬ ÷ + + <
Such a pattern classification can be performed by
Such a pattern classification can be performed by
the following (discrete)
the following (discrete)
perceptron
perceptron
(dichotomizer):
(dichotomizer):
dichotomize
dichotomize
:
:
to divide or separate into two parts
to divide or separate into two parts
dicha
dicha
:
:
in two;
in two;
tomia
tomia
: to cut
: to cut
The Perceptron
The Perceptron
+
+
sgn
sgn
(
(
v
v
)
)
x
x
1 1
x
x
2 2


1
1
y
y


2
2
1
1


2
2
v
v
) 2 2 (
2 1
+ + ÷ = x x sgn y
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Example 2:
Example 2:
Assume that a set of eight points,
Assume that a set of eight points,
P
P
0 0
, P
, P
1 1
... ,
... ,
P
P
7 7
,
,
in three
in three


dimensional space is available.
dimensional space is available.
The set consists of all vertices of a three
The set consists of all vertices of a three


dimensional
dimensional
cube as follows:
cube as follows:
{P
{P
0 0
(
(


l,
l,


1,
1,


l),
l),
P
P
1 1
(
(


l,
l,


1, l),
1, l),
P
P
2 2
(
(


1, 1,
1, 1,


1),
1),
P
P
3 3
(
(


1, 1, 1),
1, 1, 1),
P
P
4 4
(1,
(1,


1,
1,


l),
l),
P
P
5 5
(1,
(1,


1, 1),
1, 1),
P
P
6 6
(1,1,
(1,1,


1),
1),
P
P
7 7
(1, 1, 1)}
(1, 1, 1)}
Elements of this set need to be classified into two categories
Elements of this set need to be classified into two categories
The first category is defined as containing points with two
The first category is defined as containing points with two
or more positive ones; the second category contains all the
or more positive ones; the second category contains all the
remaining points that do not belong to the first category.
remaining points that do not belong to the first category.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Classification of points
Classification of points
P
P
3 3
, P
, P
5 5
, P
, P
6 6
,
,
and
and
P
P
7 7
can be
can be
based on the summation of coordinate values for
based on the summation of coordinate values for
each point evaluated for category membership.
each point evaluated for category membership.
Notice that for each point
Notice that for each point
P
P
i i
(
(
x
x
1 1
,
,
x
x
2 2
, x
, x
3 3
) ,
) ,
where
where
i
i
= 0, ... , 7, the membership in the category can be
= 0, ... , 7, the membership in the category can be
established by the following calculation:
established by the following calculation:
¹
´
¦
÷
= + +
2 category then , 1
1 category then , 1
) sgn( If
3 2 1
x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The neural network given below implements the
The neural network given below implements the
above expression:
above expression:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The network above performs the
The network above performs the
three
three
dimensional Cartesian space partitioning
dimensional Cartesian space partitioning
as illustrated below :
as illustrated below :
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2 2
2 1 3
+ + ÷ = x x x
can be viewed as a
can be viewed as a
Discriminant Function
Discriminant Function
.
.
We
We
may also write
may also write
1 2 1 2
( , ) 2 2 g x x x x = ÷ + +
1
1 2
2
( ) 2 2 where
x
g x x
x
(
= ÷ + +
(
¸ ¸
x x =
or
or
Discriminant Functions
Discriminant Functions
In Example 1
In Example 1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 2 1 2
( , ) 2 2 g x x x x = ÷ + +
can also be viewed as the equation of a plane in
can also be viewed as the equation of a plane in
3
3


D Euclidean space.
D Euclidean space.
1 2 1 2
( , ) 0 2 2 0 g x x x x = ¬ ÷ + + =
is the intersection line of the above plane with the
is the intersection line of the above plane with the
xy
xy


plane.
plane.
On the other hand
On the other hand
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Obviously:
Obviously:
1 2
1 2
1 2
( ) 0 2 2 0 gives the points on the decision line
( ) 0 2 2 0 gives the points on the plane above the decision line
( ) 0 2 2 0 gives the points on the plane below the decisio
g x x
g x x
g x x
= ¬ ÷ + + =
> ¬ ÷ + + >
< ¬ ÷ + + <
x
x
x n line
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 2
( , ) 0 g x x =
Since on the decision line we have
Since on the decision line we have
1 2 1 2
1 2 1 2
1 2
( , ) ( , )
( , ) 0
g x x g x x
dg x x dx dx
x x
c c
= + =
c c
we can write
we can write
where
where
dx
dx
1 1
and
and
dx
dx
2 2
are the increments given to
are the increments given to
x
x
1 1
and
and
x
x
2 2
on the decision line.
on the decision line.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 2
1 1
1 2
2 1 2
2
( , )
( , ) and
( , )
g x x
x dx
g x x dr
dx g x x
x
c
(
(
c
(
(
V = =
(
c (
¸ ¸
(
c
¸ ¸
and g dr V
where
Now defining
Now defining
are known to be the gradient vector (or normal
vector) and the tangent vector, respectively,
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The gradient vector points toward the positive side
of the decision line. However, there are two normal
vectors, one pointing toward the positive side, q
1
,
and the other toward the negative side, q
2
=q
1
.
1 2
1
1 2 1 1 2 2
1 2
2
( , )
2 2 2
( , ) , ( , ) ,
( , ) 1 1 1
g x x
x
x x x x
g x x
x
c
(
(
c ÷ ÷
( ( (
(
V = = = V = =
( ( (
c ÷ (
¸ ¸ ¸ ¸ ¸ ¸
(
c
¸ ¸
g g q q
For the above example the gradient and normal
For the above example the gradient and normal
vectors are given by:
vectors are given by:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
In fact q
2
is obtained from
1 2
( , ) 0 g x x ÷ =
Note that
Note that
q
q
1 1
and
and
q
q
2 2
are the projections of the
are the projections of the
normal vectors on the x
normal vectors on the x


y plane of two intersecting
y plane of two intersecting
planes whose intersection line is given by
planes whose intersection line is given by
1 2
( , ) 0 g x x =
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Although
q
q
1 1
and
and
q
q
2 2
are unique, there are infinetely
many plane pairs whose
intersection line is given
intersection line is given
by
by
1 2
( , ) 0 g x x =
Plane pairs can be built by appropriately
Plane pairs can be built by appropriately
augementing the 2
augementing the 2


D normal vectors
D normal vectors
q
q
1 1
and
and
q
q
2 2
to 3
to 3


D normal vectors which will be the normal
D normal vectors which will be the normal
vectors of the two intersecting planes.
vectors of the two intersecting planes.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The 2
The 2


D normal vectors are plane vectors given in
D normal vectors are plane vectors given in
the x
the x


y plane.
y plane.
1 2
2 2
,
1 1
÷
( (
= =
( (
÷
¸ ¸ ¸ ¸
q q
These can be augmented to 3
These can be augmented to 3


D by adding a third
D by adding a third
component, say 2, yielding
component, say 2, yielding
1 2
2 2
, 1 1
2 2
÷
( (
( (
= = ÷
( (
( (
¸ ¸ ¸ ¸
n n
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The details of building the augmented vectors
The details of building the augmented vectors
are shown below:
are shown below:
n
n
2 2
n
n
1 1
2
2


1
1


2
2
1
1
x
x
1 1
x
x
2 2
0
0
q
q
2 2
q
q
1 1


o
o


2
2
Decision
Decision
line
line
1
1
g
g
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Note that
Note that
q
q
1 1
and
and
q
q
2 2
are the normal vectors of the
are the normal vectors of the
plane that is perpendicular to the x
plane that is perpendicular to the x


y plane and
y plane and
intersects the x
intersects the x


y plane at the decision line.
y plane at the decision line.
On the other hand the vectors
On the other hand the vectors
n
n
1 1
and
and
n
n
2 2
are the
are the
normal vectors of the planes obtained by rotating
normal vectors of the planes obtained by rotating
the above perpendicular plane around the decision
the above perpendicular plane around the decision
line by
line by
o
o
and
and


, respectively.
, respectively.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
We can now determine the equations for the these
We can now determine the equations for the these
planes by using the
planes by using the
normal vector
normal vector


point
point
form of
form of
plane equation given as:
plane equation given as:
0 ) (
0
= ÷ x x n
t
where:
where:
•
•
n
n
is the normal vector of the plane,
is the normal vector of the plane,
•
•
x
x
is the vector connecting any point on the plane
is the vector connecting any point on the plane
to the origin,
to the origin,
•
•
x
x
0 0
is the
is the
vector connecting
vector connecting
a fixed point on the
a fixed point on the
plane to the origin.
plane to the origin.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
This means that
This means that
x
x


x
x
0 0
represents the vector
represents the vector
connecting all possible points
connecting all possible points
x
x
on the plane
on the plane
to fixed point
to fixed point
x
x
0 0
on the same plane. That is
on the same plane. That is
x
x


x
x
0 0
is a
is a
vector that lies on the plane.
vector that lies on the plane.
Now let us find the plane equations for the two
Now let us find the plane equations for the two
normal vectors found above.
normal vectors found above.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Let
Let
x
x
0 0
be the point (1,0,0) on the decision line.
be the point (1,0,0) on the decision line.
We can write:
We can write:
 
1
2 1 1 2
1
2 1
1
For 1 2 1 2 0 0 1
2
2 0
x
x g x x
g
÷ ¦ ¹
( ( (
¦ ¦
( ( (
= ÷ ÷ = ¬ = ÷ ÷
´ `
( ( (
¦ ¦
( ( (
¸ ¸ ¸ ¸ ¸ ¸
¹ )
¬
1
n
 
1
2 2 1 2
2
2 1
1
For 1 2 1 2 0 0 1
2
2 0
x
x g x x
g
¦ ¹
( ( (
¦ ¦
( ( (
= ÷ ÷ ÷ = ¬ = ÷ + +
´ `
( ( (
¦ ¦
( ( (
¸ ¸ ¸ ¸ ¸ ¸
¹ )
¬
2
n
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2 1 2
2
1
For 1 1
2
2
g x x
(
(
= ÷ = ÷ + +
(
(
¸ ¸
¬
2
n
1 1 2
2
1
For 1 1
2
2
g x x
÷
(
(
= = ÷ ÷
(
(
¸ ¸
¬
1
n
Because of the way g
Because of the way g
1 1
(x) and g
(x) and g
2 2
(x) are built we can
(x) are built we can
state the following:
state the following:
1 2
2 1
( ) ( ) 0 on the positive side of the decision line
( ) ( ) 0 on the negative side of the decision line
g x g x
g x g x
÷ >
÷ >
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
n
n
2 2
n
n
1 1
x
x
1 1
x
x
2 2
Decisio Decision
line line
g
g
2 2
g
g
1 1
Decision Decision
line line
g
g
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Now we can compute
Now we can compute
g
g
1 1
(x) and
(x) and
g
g
2 2
(x) for the selected
(x) for the selected
patterns in Example 1.
patterns in Example 1.
Class 1 Class 1 Class 2 Class 2
(2,0) (2,0) (1.5, (1.5, 1) 1) (1, (1, 2) 2) (0,0) (0,0) ( ( 0.5,1) 0.5,1) ( ( 1, 1, 2) 2)
g g
1 1
 g g
2 2
>0 >0 g g
1 1
 g g
2 2
>0 >0 g g
1 1
 g g
2 2
>0 >0 g g
2 2
 g g
1 1
>0 >0 g g
2 2
 g g
1 1
>0 >0 g g
2 2
 g g
1 1
>0 >0
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Henceforth, such
Henceforth, such
g
g
i i
(x) functions will be called
(x) functions will be called
Discriminant Functions
Discriminant Functions
.
.
We can conclude that:
We can conclude that:
2 Class in patterns for the ) ( ) (
1 Class in patterns for the ) ( ) (
1 2
2 1
x g x g
x g x g
>
>
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Minimum Distance Classification
Minimum Distance Classification
The classification of two clusters is carried out in
The classification of two clusters is carried out in
such a way that the boundary of these two clusters
such a way that the boundary of these two clusters
is drawn as a line perpendicular to and passing
is drawn as a line perpendicular to and passing
through the midpoint of the line connecting the
through the midpoint of the line connecting the
center points of two clusters . Therefore the
center points of two clusters . Therefore the
boundary line is the perpendicular bisector of a
boundary line is the perpendicular bisector of a
connecting line.
connecting line.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
P
P
j j
P
P
i i
Positive side
Positive side
x
x
j j
x
x
i i
Negative side
Negative side
P
P
0 0
=(x
=(x
i i
+x
+x
j j
)/2
)/2
x
x
i i


x
x
j j
0
0
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Now we will derive the equation of the boundary line.
Now we will derive the equation of the boundary line.
Let the vector x and x
Let the vector x and x
0 0
represent any point on this and
represent any point on this and
the point P
the point P
0 0
, respectively. Then the following must hold:
, respectively. Then the following must hold:
0
( ) ( ) 0
t
i j
÷ ÷ = x x x x
which can be written in the form
which can be written in the form
1
( ) ( ( )) 0
2
t
i j i j
÷ ÷ + = x x x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1
( ) ( ) ( ) 0
2
t t
i j i j i j
÷ ÷ ÷ + = x x x x x x x
and
and
2
2
1
( ) ( ) 0
2
t
i j i j
÷ ÷ ÷ = x x x x x
or
or
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2
2
1
( ) ( ) ( )
2
t
ij i j i j
g = ÷ ÷ ÷ x x x x x x
Now defining
Now defining
We have already seen that the boundary (decision)
We have already seen that the boundary (decision)
line can be taken as the intersection of two planes
line can be taken as the intersection of two planes
g
g
i i
and
and
g
g
j j
.
.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
( ) ( ) ( )
ij i j
g g g = ÷ x x x
where we have called
where we have called
g
g
i i
(x)
(x)
discriminant function
discriminant function
s
s
and shown that they are associated with plane
and shown that they are associated with plane
equations.
equations.
Therefore
Therefore
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2
2 1
( ) ( ) ( ) ( )
2
t
i j i j i j
g g ÷ ÷ ÷ = ÷ x x x x x x x
Now using the two equations above we obtain
Now using the two equations above we obtain
which can be used to make the following
which can be used to make the following
identification:
identification:
2 1
( )
2
t
i i i
g = ÷ x x x x
2
1
( )
2
t
j j j
g = ÷ x x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
g
g
i i
(x) can also be expressed as:
(x) can also be expressed as:
, 1
( )
t
i i i n
g w
+
= + x w x
Therefore we can make the identification:
Therefore we can make the identification:
2
, 1
1
2
i
i n i
w
+
= ÷
i
w = x
x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
An alternative approach towards the construction
An alternative approach towards the construction
of discriminant functions may be taken as follows:
of discriminant functions may be taken as follows:
Let us assume that a minimum
Let us assume that a minimum
–
–
distance
distance
classification is requried to classify patterns into
classification is requried to classify patterns into
R categories. Each of the classes is represented by
R categories. Each of the classes is represented by
its center point
its center point
P
P
i i
, i=1,2,
, i=1,2,
…
…
..,R. The Euclidean
..,R. The Euclidean
distance between an input pattern
distance between an input pattern
x
x
and the point
and the point
P
P
i i
is given by the norm of the vector
is given by the norm of the vector
x
x


x
x
i i
as:
as:
) ( ) (
i
t
i i
x x x ÷ ÷ = ÷ x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
A minimum
A minimum
–
–
distance classifier computes the distance
distance classifier computes the distance
from a pattern of unknown classification to each of the
from a pattern of unknown classification to each of the
center points
center points
P
P
i i
. Then the category number of the point
. Then the category number of the point
that yields the minimum distance is assigned to the
that yields the minimum distance is assigned to the
unknown pattern.
unknown pattern.
2
i
÷ =
t t t t t t
i i i i i i
1
x x x x  2x x + x x = x x  2(x x  x x ) > 0
2
Squaring the above equation yields
Squaring the above equation yields
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Since
Since
xx
xx
t t
is independent of
is independent of
i,
i,
this term is constant
this term is constant
with respect to the categories. Therefore, in order
with respect to the categories. Therefore, in order
to minimize the distance
to minimize the distance
t t
i i i i
1
g (x) = x x  x x
2
i
÷ x x
we need to maximize
we need to maximize
which is called a discriminant function.
which is called a discriminant function.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
It is also assumed that the index of each point
(pattern)corresponds to its class number.
Example 3: A linear minimum
Example 3: A linear minimum


distance classifier
distance classifier
will be designed for the three points given as:
will be designed for the three points given as:
(
¸
(
¸
÷
=
(
¸
(
¸
÷
=
(
¸
(
¸
=
5
5
,
5
2
,
2
10
3 2 1
x x x
The three points and the connecting lines
The three points and the connecting lines
constitute a triangle which is shown on the
constitute a triangle which is shown on the
next slide:
next slide:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(


5,5)
5,5)
P
P
2 2
(2,
(2,


5)
5)
x
x
2 2
x
x
1 1
0
0
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Now let us draw the circle passing through all three
Now let us draw the circle passing through all three
vertices of the triangle, the
vertices of the triangle, the
circumcircle
circumcircle
. We can
. We can
conclude that each boundary is a
conclude that each boundary is a
perpendicular
perpendicular
bisector
bisector
of
of
the
the
triangle
triangle
.
.
A
A
perpendicular bisector
perpendicular bisector
of a
of a
triangle is a straight line passing through the
triangle is a straight line passing through the
midpoint of a side and being perpendicular to it, i.e.
midpoint of a side and being perpendicular to it, i.e.
forming a right angle with it. The three
forming a right angle with it. The three
perpendicular bisectors meet
perpendicular bisectors meet
at
at
a single point, the
a single point, the
triangle's
triangle's
circumcenter
circumcenter
; this point is the center of the
; this point is the center of the
c
c
ircumcircle
ircumcircle
.
.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(


5,5)
5,5)
P
P
2 2
(2,
(2,


5)
5)
x
x
2 2
x
x
1 1
0
0
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2
2
1
( ) ( ) ( )
2
t
ij i j i j
g = ÷ ÷ ÷ x x x x x x
(
¸
(
¸
÷
=
(
¸
(
¸
÷
=
(
¸
(
¸
=
5
5
,
5
2
,
2
10
3 2 1
x x x
Now using
Now using
and
and
we obtain
we obtain
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2 2
12 1
1
2
1 2
1
( ) ( ) ( )
2
10 2
1
[(100 4) (4 25)]
2 5 2
8 7 37.5
t
1 2 2
t
g
x
x
x x
= ÷ ÷ ÷
¦ ¹
( ( (
= ÷ ÷ + ÷ +
´ `
( ( (
÷
¸ ¸ ¸ ¸ ¸ ¸
¹ )
= + ÷
x x x x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2 2
13
1
2
1 2
1
( ) ( ) ( )
2
10 5
1
[(100 4) (25 25)]
2 5 2
15 3 27
t
1 3 1 3
t
g
x
x
x x
= ÷ ÷ ÷
¦ ÷ ¹
( ( (
= ÷ ÷ + ÷ +
´ `
( ( (
¸ ¸ ¸ ¸ ¸ ¸
¹ )
= ÷ ÷
x x x x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2 2
23 2
1
2
1 2
1
( ) ( ) ( )
2
2 5
1
[(4 25) (25 25)]
5 5 2
7 10 10.5
t
2 3 3
t
g
x
x
x x
= ÷ ÷ ÷
¦ ÷ ¹
( ( (
= ÷ ÷ + ÷ +
´ `
( ( (
÷
¸ ¸ ¸ ¸ ¸ ¸
¹ )
= ÷ +
x x x x x x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2
, 1
1
2
i
i n i
w
+
= ÷
i
w = x
x
Now using
Now using
we obtain
we obtain
1 2 3
10 2 5
; ; ; 2 5 5
52 14.5 25
w w w
÷
( ( (
( ( (
= = = ÷
( ( (
( ( (
¸ ¸ ¸ ¸ ¸ ¸
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
, 1
( )
t
i i i n
g w
+
= + x w x
and using
and using
we obtain
we obtain
1 1 2
2 1 2
3 1 2
( ) 10 2 52
( ) 2 5 14.5
( ) 5 5 25
g x x
g x x
g x x
= + ÷
= ÷ ÷
= ÷ + ÷
x
x
x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
A block diagram producing the three discriminant
A block diagram producing the three discriminant
functions is shown below:
functions is shown below:
1 2
10 2 52 x x + ÷
x
x
1 1
x
x
2 2


1
1
10
10
2
2
52
52
2
2


5
5
14.5
14.5


5
5
5
5
25
25
1 2
2 5 14.5 x x ÷ ÷
1 2
5 5 25 x x ÷ + ÷
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The discriminant values for the three patterns
The discriminant values for the three patterns
P
P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,


5) and P
5) and P
3 3
(
(


5,5) are shown in the
5,5) are shown in the
table below:
table below:
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2


5]
5]
t t
Class 3
Class 3
[
[


5 5]
5 5]
t t
g
g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2


52
52
52
52


42
42


92
92
g
g
2 2
(x)= 2x
(x)= 2x
1 1


5x
5x
2 2


14.5
14.5


4.5
4.5
14.5
14.5


49.5
49.5
g
g
3 3
(x)=
(x)=


5x
5x
1 1
+5x
+5x
2 2


25
25


65
65


60
60
25
25
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
As required by the definition of the discriminant
As required by the definition of the discriminant
function, the responses on the diagonal are the
function, the responses on the diagonal are the
largest in each column. It will be shown later that
largest in each column. It will be shown later that
the same is true for any three points P
the same is true for any three points P
1 1
,P
,P
2 2
,P
,P
3 3
taken
taken
from the three decision regions
from the three decision regions
H
H
1 1
,
,
,
,
H
H
2 2
,
,
H
H
3 3
provided
provided
that the decision regions are determined as shown
that the decision regions are determined as shown
above. Therefore using a maximum selector at the
above. Therefore using a maximum selector at the
output will provide the required function from the
output will provide the required function from the
network.
network.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Using the same network with TLUs (bipolar
Using the same network with TLUs (bipolar
activation functions) will result in the outputs
activation functions) will result in the outputs
given in the table below:
given in the table below:
Input
Input
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2


5]
5]
t t
Class 3
Class 3
[
[


5 5]
5 5]
t t
sgn(g
sgn(g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2


5)
5)
1
1


1
1


1
1
sgn(g
sgn(g
2 2
(x)=
(x)=


x
x
2 2


2)
2)


1
1
1
1


1
1
sgn(g
sgn(g
3 3
(x)=
(x)=


9x
9x
1 1
+x
+x
2 2
)
)


1
1


1
1
1
1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
However, as the next example will demonstrate
However, as the next example will demonstrate
this is not true for any three points P
this is not true for any three points P
1 1
,P
,P
2 2
,P
,P
3 3
taken
taken
from the three decision regions
from the three decision regions
H
H
1 1
,
,
H
H
2 2
,
,
H
H
3 3
.
.
The diagonal entries=1
The diagonal entries=1
The offdiagonal entries=
The offdiagonal entries=


1
1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The response of the same network to the patterns
The response of the same network to the patterns
Q
Q
1 1
(5,0), Q
(5,0), Q
2 2
(0,1) and Q
(0,1) and Q
3 3
(
(


4,0) are shown in the table below:
4,0) are shown in the table below:
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[5 0]
[5 0]
t t
Class 2
Class 2
[0 1]
[0 1]
t t
Class 3
Class 3
[
[


4 0]
4 0]
t t
g
g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2


52
52


2
2


50
50


92
92
g
g
2 2
(x)= 2x
(x)= 2x
1 1


5x
5x
2 2


14.5
14.5


4.5
4.5


19.5
19.5


22.5
22.5
g
g
3 3
(x)=
(x)=


5x
5x
1 1
+5x
+5x
2 2


25
25


50
50


20
20


5
5
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The responses on the diagonal are still the largest
The responses on the diagonal are still the largest
in each column. However, using the same network
in each column. However, using the same network
with TLUs (bipolar activation functions) will result
with TLUs (bipolar activation functions) will result
in the outputs given in the table on the next slide:
in the outputs given in the table on the next slide:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[5 0]
[5 0]
t t
Class 2
Class 2
[0 1]
[0 1]
t t
Class 3
Class 3
[
[


4 0]
4 0]
t t
sgn(g
sgn(g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2


52)
52)


1
1


1
1


1
1
sgn(g
sgn(g
2 2
(x)= 2x
(x)= 2x
1 1


5x
5x
2 2


14.5)
14.5)


1
1


1
1


1
1
sgn(g
sgn(g
3 3
(x)=
(x)=


5x
5x
1 1
+5x
+5x
2 2


25)
25)


1
1


1
1


1
1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
It is therefore impossible to use TLUs once the
It is therefore impossible to use TLUs once the
decision lines are calculated using the
decision lines are calculated using the
minimum
minimum


distance calssification procedure.
distance calssification procedure.
The only way out is using a maximum selector.
The only way out is using a maximum selector.
The explanation of the responses on the diagonal
The explanation of the responses on the diagonal
being the largest in each column will now be made
being the largest in each column will now be made
in detail.
in detail.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 1 2
2 1 2
3 1 2
10 2 52 0
2 5 14.5 0
5 5 25 0
g x x
g x x
g x x
÷ ÷ + =
÷ ÷ + =
+ ÷ + =
The discriminant functions determine the plane
The discriminant functions determine the plane
equations
equations
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
These planes are shown on the next slide.
These planes are shown on the next slide.
It is easily seen that:
It is easily seen that:
For any point in
For any point in
H1
H1
:
:
g
g
1 1
(x)>g
(x)>g
2 2
(x) and g
(x) and g
1 1
(x)>g
(x)>g
3 3
(x)
(x)
For any point in
For any point in
H2
H2
: g
: g
2 2
(x)>g
(x)>g
1 1
(x) and g
(x) and g
2 2
(x)>g
(x)>g
3 3
(x)
(x)
For any point in
For any point in
H3
H3
: g
: g
3 3
(x)>g
(x)>g
1 1
(x) and g
(x) and g
3 3
(x)>g
(x)>g
2 2
(x)
(x)
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
10
5
0
5
10
10
5
0
5
10
200
150
100
50
0
50
100
x1
x2
g
i
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The decision regions
The decision regions
H
H
1 1
,
,
H
H
2 2
,
,
H
H
3 3
are projections of
are projections of
the planes g
the planes g
1 1
,g
,g
2 2
and g
and g
3 3
, respectively, on the x
, respectively, on the x
1 1


x
x
2 2
plane and the decision lines are the projections of
plane and the decision lines are the projections of
the intersection lines of the planes
the intersection lines of the planes
g
g
i i
on the x
on the x
1 1


x
x
2 2
plane which are shown on the next slide.
plane which are shown on the next slide.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(


5,5)
5,5)
P P
123 123
(2.337,2.686) (2.337,2.686)
x
x
1 1
0
0
g
g
13 13
(x)=0
(x)=0
g
g
12 12
(x)=0
(x)=0
g
g
23 23
(x)=0
(x)=0
1 2
1 3
( ) ( )
( ) ( )
g x g x
g x g x
>
>
2 1
2 3
( ) ( )
( ) ( )
g x g x
g x g x
>
>
3 1
3 2
( ) ( )
( ) ( )
g x g x
g x g x
>
>
P
P
2 2
(2,
(2,


5)
5)
x
x
2 2
H1,,
H1,,
H2
H2
H3
H3
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
A MATLAB plot of the projections of the
A MATLAB plot of the projections of the
intersection lines of the planes
intersection lines of the planes
g
g
i i
are shown
are shown
on the next slide
on the next slide
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
30 20 10 0 10 20 30
30
20
10
0
10
20
30
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
12 1 2
13 1 2
23 1 2
( ) 8 7 37.5 0
( ) 15 3 27 0
( ) 7 10 10.5 0
g x x
g x x
g x x
= + ÷ =
= ÷ ÷ =
= ÷ + =
x
x
x
The projections of the intersection lines of the
The projections of the intersection lines of the
planes
planes
g
g
i i
on the x
on the x
1 1


x
x
2 2
plane are shown to be given
plane are shown to be given
by the following line equations:
by the following line equations:
The previous slide shows the segments that can
The previous slide shows the segments that can
be seen from the top.
be seen from the top.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The continuation of the line g
The continuation of the line g
12 12
=0 remains
=0 remains
underneath the plane g
underneath the plane g
3 3
.
.
The continuation of the line g
The continuation of the line g
23 23
=0 remains
=0 remains
underneath the plane g
underneath the plane g
1 1
.
.
The continuation of the line g
The continuation of the line g
13 13
=0 remains
=0 remains
underneath the plane g
underneath the plane g
2 2
.
.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
A classifier using a maximum selector is shown on
A classifier using a maximum selector is shown on
the next slide. The maximum selector selects the
the next slide. The maximum selector selects the
maximum discriminant and responds with the
maximum discriminant and responds with the
number of the discriminant having the largest
number of the discriminant having the largest
value.
value.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Classifier using the maximum selector
Classifier using the maximum selector
x
x
1 1
x
x
2 2


1
1
10
10
2
2
52
52
2
2


5
5
14.5
14.5


5
5
5
5
25
25
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The classifier can be redrawn as follows:
The classifier can be redrawn as follows:
Classifier using the maximum selector
Classifier using the maximum selector
x
x
1 1
x
x
2 2


1
1
10
10
2
2
52
52
2
2


5
5
14.5
14.5


5
5
5
5
25
25
x
x
1 1
x
x
2 2


1
1
x
x
2 2


1
1
x
x
1 1
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Classifier using the maximum selector
Classifier using the maximum selector
x
x
1 1
x
x
2 2


1
1
10
10
2
2
52
52
2
2


5
5
14.5
14.5


5
5
5
5
25
25
x
x
1 1
x
x
2 2


1
1
x
x
2 2


1
1
x
x
1 1
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
In the above we have designed a classifier which was In the above we have designed a classifier which was
based on the minimum based on the minimum distance classification for known distance classification for known
clusters and derived the network with three perceptrons clusters and derived the network with three perceptrons
from the discriminant functions which were interpreted from the discriminant functions which were interpreted
as plane equations. Instead, now let us consider the as plane equations. Instead, now let us consider the
network on the next slide which is obtained as a result of network on the next slide which is obtained as a result of
training a network with three perceptrons using the same input training a network with three perceptrons using the same input
patterns P patterns P
1 1
(10,2), P (10,2), P
2 2
(2, (2, 5) and P 5) and P
3 3
( ( 5,5) 5,5) as in the previous network . as in the previous network .
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 2
5 3 5 x x + ÷
x
x
1 1
x
x
2 2


1
1
5
5
3
3
5
5
0
0


1
1
2
2


9
9
1
1
0
0
2
2 x ÷ ÷
1 2
9x x ÷ +
TLU#3
TLU#3
TLU#2
TLU#2
TLU#1
TLU#1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
In fact
In fact
g
g
i i
(x)=0 define the intersection of
(x)=0 define the intersection of
g
g
i i
planes
planes
with x
with x
1 1


x
x
2 2
plane. Therefore the TLU divides the
plane. Therefore the TLU divides the
g
g
i i
planes into two regions:
planes into two regions:
(1)
(1)
the upper
the upper


half plane which is above
half plane which is above
x
x
1 1


x
x
2 2
plane
plane
and
and
(1)
(1)
the lower
the lower


half plane which is below
half plane which is below
x
x
1 1


x
x
2 2
plane
plane
.
.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 2
2
1 2
5 3 5 0
2 0
9 0
x x
x
x x
+ ÷ =
÷ ÷ =
÷ + =
The decision lines are obtained by setting
The decision lines are obtained by setting
g
g
i i
(
(
x
x
)=0
)=0
which are given on the next slide. The shaded
which are given on the next slide. The shaded
areas are indecision regions which will become
areas are indecision regions which will become
clear in the following discussion.
clear in the following discussion.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
2
2 0 x ÷ ÷ =
1
2
2
(0, 9) 22 1
(0, 9) 29 1
(0, 9) 9 1
g
g
g
( ( (
( ( (
= ÷ ¬ ÷
( ( (
( ( (
¸ ¸ ¸ ¸ ¸ ¸
1
2
2
(2, 5) 10 1
(2, 5) 3 1
(2, 5) 23 1
g
g
g
÷ ÷ ÷ ( ( (
( ( (
÷ = ¬
( ( (
( ( ( ÷ ÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
1 2
5 3 5 0 x x + ÷ =
1 2
9 0 x x ÷ + =
x
x
1 1
x
x
2 2
0
0
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(


5,5)
5,5)
P
P
2 2
(2,
(2,


5)
5)
Q
Q
2 2
(4,
(4,


4)
4)
Q
Q
1 1
(0,9)
(0,9)
Q
Q
3 3
(
(


1,
1,


3)
3)
1
2
2
(10, 2) 51 1
(10, 2) 4 1
(10, 2) 88 1
g
g
g
( ( (
( ( (
= ÷ ¬ ÷
( ( (
( ( ( ÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
1
2
2
( 5,5) 15 1
( 5,5) 7 1
( 5,5) 50 1
g
g
g
÷ ÷ ÷ ( ( (
( ( (
÷ = ÷ ¬ ÷
( ( (
( ( ( ÷
¸ ¸ ¸ ¸ ¸ ¸
1
2
2
(4, 4) 3 1
(4, 4) 2 1
(4, 4) 40 1
g
g
g
÷ ( ( (
( ( (
÷ = ¬
( ( (
( ( ( ÷ ÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
1
2
2
( 1, 3) 19 1
( 1, 3) 1 1
( 1, 3) 6 1
g
g
g
÷ ÷ ÷ ÷ ( ( (
( ( (
÷ ÷ = ¬
( ( (
( ( ( ÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The discriminant values g
The discriminant values g
1 1
(x), g
(x), g
2 2
(x), g
(x), g
3 3
(x) for
(x) for
the same three patterns P
the same three patterns P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,


5) and
5) and
P
P
3 3
(
(


5,5) are shown in the table below:
5,5) are shown in the table below:
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2


5]
5]
t t
Class 3
Class 3
[
[


5 5]
5 5]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2


5
5
51
51


10
10


15
15
g
g
2 2
(x)=
(x)=


x
x
2 2


2
2


4
4
3
3


7
7
g
g
3 3
(x)=
(x)=


9x
9x
1 1
+x
+x
2 2


88
88


23
23
50
50
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The outputs of the network with three discrete
The outputs of the network with three discrete
perceptrons are shown in the table below:
perceptrons are shown in the table below:
Input
Input
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2


5]
5]
t t
Class 3
Class 3
[
[


5 5]
5 5]
t t
sgn(g
sgn(g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2


5)
5)
1
1


1
1


1
1
sgn(g
sgn(g
2 2
(x)=
(x)=


x
x
2 2


2)
2)


1
1
1
1


1
1
sgn(g
sgn(g
3 3
(x)=
(x)=


9x
9x
1 1
+x
+x
2 2
)
)


1
1


1
1
1
1
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The table on the previous slide shows that the new
The table on the previous slide shows that the new
discriminant functions
discriminant functions
1 1 2
2 2
3 1 2
( ) 5 3 5
( ) 2
( ) 9
g x x
g x
g x x
= + ÷
= ÷ ÷
= ÷ +
x
x
x
classify the paterns P
classify the paterns P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,


5) and P
5) and P
3 3
(
(


5,5)
5,5)
in the same way as the discriminant functions
in the same way as the discriminant functions
1 1 2
2 1 2
3 1 2
( ) 10 2 52
( ) 2 5 14.5
( ) 5 5 25
g x x
g x x
g x x
= + ÷
= ÷ ÷
= ÷ + ÷
x
x
x
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Conclusion:
Conclusion:
The network, which is obtained through the
The network, which is obtained through the
perceptron learning algorithm, and the network
perceptron learning algorithm, and the network
obtained using the maximum
obtained using the maximum


distance classification
distance classification
procedure have classified the three points
procedure have classified the three points
P
P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,


5) and P
5) and P
3 3
(
(


5,5) in exactly the same
5,5) in exactly the same
way, i.e.,
way, i.e.,
3 class (5,5) P
2 class (2,5) P
1 class (10,2) P
3
2
1
¬
¬
¬
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Now consider the patterns Q1(0,9), Q2(4,
Now consider the patterns Q1(0,9), Q2(4,


4)
4)
and Q3(
and Q3(


1,
1,


3) which fall into shaded areas.
3) which fall into shaded areas.
The discriminant values for these patterns are
The discriminant values for these patterns are
shown in the table on the next slide:
shown in the table on the next slide:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Input
Input
Discriminant
Discriminant
[0 9]
[0 9]
t t
[4
[4


4]
4]
t t
[
[


1
1


3]
3]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2


5
5
22
22
3
3


19
19
g
g
2 2
(x)=
(x)=


x
x
2 2


2
2


29
29
2
2
1
1
g
g
3 3
(x)=
(x)=


9x
9x
1 1
+x
+x
2 2
9
9


40
40
6
6
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 3 2
3 2 1
1 2 3
(0,9) (0,9), (0,9)
( 1, 3)> ( 1, 3), ( 1, 3)
(4,4)> (4,4), (4,4)
g g g
g g g
g g g
>
÷ ÷ ÷ ÷ ÷ ÷
Since
Since
if we use a
if we use a
maximum selector instead of the three
maximum selector instead of the three
TLUs the network can decide that
TLUs the network can decide that
1 2
(0,9) and Q (4,4) class 1 Q ¬
3
( 1, 3) class 3 Q ÷ ÷ ¬
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Input
Input
Discriminant
Discriminant
[0 9]
[0 9]
t t
[4
[4


4]
4]
t t
[
[


1
1


3]
3]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2


5
5
1
1
1
1


1
1
g
g
2 2
(x)=
(x)=


x
x
2 2


2
2


1
1
1
1
1
1
g
g
3 3
(x)=
(x)=


9x
9x
1 1
+x
+x
2 2
1
1


1
1
1
1
On the other hand, if we use TLUs we would
On the other hand, if we use TLUs we would
obtain the outputs in the following table:
obtain the outputs in the following table:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
In order to make a classification we should have a
In order to make a classification we should have a
column with one 1 and two
column with one 1 and two


1s. Therefore
1s. Therefore
according to the table obtained non of the three
according to the table obtained non of the three
patterns Q
patterns Q
1 1
(0,9), Q
(0,9), Q
2 2
(4,
(4,


4) and Q
4) and Q
3 3
(
(


1,
1,


3) could be
3) could be
classified into any class. Therefore according to
classified into any class. Therefore according to
the network with TLUs the shaded areas will be
the network with TLUs the shaded areas will be
called
called
indecision regions.
indecision regions.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 1 2
2 2
3 1 2
5 3 5
2
9
g x x
g x
g x x
= + ÷
= ÷ ÷
= ÷ +
Now let us consider the planes defined by
Now let us consider the planes defined by
which are plotted on the next slide:
which are plotted on the next slide:
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
10
8 6 4 2
0 2 4 6
8 10
10
0
10
100
50
0
50
100
x1
g
i
x2
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
12 1 2
23 1 2
13 1 2
: 5 4 3 0
: 9 2 2 0
: 14 2 5 0
g x x
g x x
g x x
+ ÷ =
÷ ÷ =
÷ ÷ + =
The projections of the intersection lines of the
The projections of the intersection lines of the
planes
planes
g
g
i i
(x) on the x
(x) on the x
1 1


x
x
2 2
plane are given by
plane are given by
The segments that can be seen from the top are
The segments that can be seen from the top are
plotted on the next slide.
plotted on the next slide.
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
10 5 0 5 10 15 20
15
10
5
0
5
10
15
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The continuation of the line g
The continuation of the line g
12 12
=0 remains
=0 remains
underneath the plane g
underneath the plane g
3 3
.
.
The continuation of the line g
The continuation of the line g
23 23
=0 remains
=0 remains
underneath the plane g
underneath the plane g
1 1
.
.
The continuation of the line g
The continuation of the line g
13 13
=0 remains
=0 remains
underneath the plane g
underneath the plane g
2 2
.
.
v
1
x
2
x
j
x
n
1
k
K
2
v
2
v
k
v
m
x
1
y
1
y
2
y
k
y
m
Neurons
Input nodes
Output nodes
w
11
w
k1
w
kj
w
mn
w
K1
w
K2
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
J J j j
x w ...... x w ......... x w x w v
1 1 2 12 1 11 1
+ + + + + =
J J j j
x w ...... x w ......... x w x w v
2 2 2 22 1 21 2
+ + + + + =
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
J KJ j Kj K K K
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
) v ( f y
1 1
=
) v ( f y
2 2
=
) v ( f y
k k
=
) v ( f y
K K
=
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
(
(
(
(
(
(
¸
(
¸
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J KJ K K
J
J
K
x
.
.
x
x
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
v
.
.
v
v
2
1
2 1
1 22 21
1 12 11
2
1
Wx v =






.

\

(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J J K
v
.
.
v
v
Γ
) v ( f
.
.
) v ( f
) v ( f
y
.
.
y
y
2
1
2
1
2
1
Γ(v) y =
(
(
(
(
(
(
¸
(
¸
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J K
v
.
.
v
v
(.) f . .
. . . . .
. . . . .
. . (.) f
. . (.) f
y
.
.
y
y
2
1
2
1
0 0
0 0
0 0
Γ[Wx] y =
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Single
Single


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1 1 1 2 3
2 2 2 3
3 1 2
5 3 5 5 3 5
0 1 2 2
9 1 0 1 9
v x x x x
v x x x
v x x
+ +
( ( ( (
( ( ( (
= ÷ = ÷ +
( ( ( (
( ( ( (
÷ ÷ ÷ +
¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸
1 1 2 3
2 2 3
3 1 2
sgn(5 3 5 )
sgn( 2 )
sgn( 9 )
y x x x
y x x
y x x
+ +
( (
( (
= ÷ +
( (
( (
÷ +
¸ ¸ ¸ ¸
Example 1:
Example 1:
1
2
1
x
x x
(
(
=
(
(
÷
¸ ¸
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Example 1:
Example 1:
Design a neural network such that the network
Design a neural network such that the network
maps the shaded region of plane
maps the shaded region of plane
x
x
1, 1,
x
x
2 2
into
into
y
y
= 1, and it
= 1, and it
maps its complement into
maps its complement into
y
y
=
=


1, where
1, where
y
y
is the output of the
is the output of the
neural network. In summary, the network will provide the
neural network. In summary, the network will provide the
mapping of the entire
mapping of the entire
x
x
1 1
, x
, x
2 2
plane into one of the two points
plane into one of the two points
±
±
1 on the real number axis.
1 on the real number axis.
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
Solution:
Solution:
The inputs to the neural network will
The inputs to the neural network will
be
be
x
x
1 1
, x
, x
2 2
and the threshold value
and the threshold value


1. Thus the
1. Thus the
input vector is given as:
input vector is given as:
(
(
(
¸
(
¸
÷
=
1
2
1
x
x
x
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The boundaries of the shaded region are given
The boundaries of the shaded region are given
by the equations:
by the equations:
0 3
0
0 2
0 1
2
2
1
1
= ÷
=
= ÷
= ÷
x
x
x
x
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The shaded region satisfies the inequalities:
The shaded region satisfies the inequalities:
1
1
2
2
1
2
0
3
x
x
x
x
>
<
>
<
or
or
1
1
2
2
1 0
2 0
0
3 0
x
x
x
x
÷ >
÷ + >
>
÷ + >
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
These inequalities may be implemented using
These inequalities may be implemented using
four neurons:
four neurons:
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
1
1
2
2
3
4
1 0 1
1 0 2
0 1 0
1
0 1 3
v
x
v
x
v
v
( (
(
( (
÷ ÷
(
( (
=
(
( (
(
÷
( (
¸ ¸
÷ ÷
¸ ¸ ¸ ¸
The equations for the first layer are obtained as:
The equations for the first layer are obtained as:
 
t
x x x x y ) 3 sgn( ) sgn( ) 2 sgn( ) 1 sgn(
2 2 1 1
+ ÷ + ÷ ÷ =
where binary
where binary
(
(
threshold or hard limiter
threshold or hard limiter
)
)
activation
activation
function, i.e.,
function, i.e.,
discrete perceptron
discrete perceptron
is used.
is used.
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The half
The half


planes where the neurons' responses are
planes where the neurons' responses are
positive (+ 1) have been marked with arrows pointing
positive (+ 1) have been marked with arrows pointing
toward the positive response half plane.
toward the positive response half plane.
The response of the. second layer can be easily
The response of the. second layer can be easily
obtained as
obtained as
Let us discuss the mapping performed by the first
Let us discuss the mapping performed by the first
layer. Note that each of the neurons 1 through 4
layer. Note that each of the neurons 1 through 4
divides the plane
divides the plane
x
x
l l
,x
,x
2 2
into two half
into two half


planes.
planes.
) 5 . 3 sgn(
4 3 2 1
÷ + + + = y y y y y
Two
Two


Layer Feedforward Neural
Layer Feedforward Neural
Network
Network
The resultant neural network
The resultant neural network
The Perceptron Training Algorithm
The Perceptron Training Algorithm
For the development of the For the development of the perceptron perceptron
learning algorithm learning algorithm for a single for a single layer layer
perceptron, we find it more convenient perceptron, we find it more convenient
to work with the modified to work with the modified
signal signal flow graph model given here. In this second model, flow graph model given here. In this second model,
which is equivalent to that of the which is equivalent to that of the previous figure, the threshold previous figure, the threshold
u u is treated as a synaptic weight connected to a fixed input is treated as a synaptic weight connected to a fixed input
equal to equal to  1. We may thus define the 1. We may thus define the (p (p + 1) + 1) by by 1 input vector and 1 input vector and
the corresponding weight vector as: the corresponding weight vector as:
1 2
[ ... ... 1]
t
n
x x x x =
u
1 2
[ ... ... ]
t
n
w w w w u = ÷
The Perceptron Training Algorithm
The Perceptron Training Algorithm
u
These vectors are respectively called the augmented input vector These vectors are respectively called the augmented input vector
and the augmented weight vector. and the augmented weight vector.
For fixed For fixed n, n, the equation the equation w w
t t
x x = 0 = 0, plotted in , plotted in
p p dimensional space with coordinates dimensional space with coordinates x x
l l
; x ; x
z z
, , ... , ... , x x
p p
, , defines a defines a
hyperplane as the decision surface between two different classes hyperplane as the decision surface between two different classes of of
inputs. inputs.
Suppose then the input variables of the single Suppose then the input variables of the single layer perceptron layer perceptron
originate from two originate from two linearly separable classes linearly separable classes that fall on the that fall on the
opposite sides of some hyperplane. Let X opposite sides of some hyperplane. Let X
l l
be the subset of training be the subset of training
vectors x vectors x
l l
(1), x (1), x
l l
(2), ... that belong to class C (2), ... that belong to class C
1 1
, and let X , and let X
2 2
be the be the
subset of training vectors x subset of training vectors x
2 2
(1), x (1), x
2 2
(2), ... (2), ...
that belong to class C that belong to class C
2 2
. The union of X . The union of X
l l
and X and X
2 2
is the complete is the complete
training set X. training set X.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Given the sets of vectors X
Given the sets of vectors X
l l
and X
and X
2 2
to train the classifier,
to train the classifier,
the training process involves the adjustment of the
the training process involves the adjustment of the
weight vector w in such a way that the two classes C
weight vector w in such a way that the two classes C
l l
and
and
C
C
2 2
are separable.
are separable.
These two classes are said to be
These two classes are said to be
linearly separable
linearly separable
if a
if a
realizable setting of the weight vector w exists.
realizable setting of the weight vector w exists.
Conversely, if the two classes C
Conversely, if the two classes C
l l
and C
and C
2 2
are known to be
are known to be
linearly separable, then there exists a weight vector w
linearly separable, then there exists a weight vector w
such that we may state
such that we may state
:
:
u
The Perceptron Training Algorithm
The Perceptron Training Algorithm
0
t
w x >
 for every input vector x belonging to class C
1
0
t
w x s
for every input vector x belonging to class C
2
The Perceptron Training Algorithm
The Perceptron Training Algorithm
u Given the subsets of training vectors X
1
and X
2
,
the training problem for the elementary ·perceptron
is then to find a weight vector w such that the two
inequalities above are satisfied.
However, until this is achieved in the itermediate steps
However, until this is achieved in the itermediate steps
we will have
we will have
0
t
w x >
 for some input vectors x belonging to class C
2
0
t
w x s
for some input vector x belonging to class C
1
The Perceptron Training Algorithm
The Perceptron Training Algorithm
t
w x
0
t
w x s
t
w x
u
In the former case therefore will
In the former case therefore will
be reduced until is achieved,
be reduced until is achieved,
and in the latter case will be
and in the latter case will be
increased until is reached.
increased until is reached.
Here we will begin to examine neural
Here we will begin to examine neural
network classifiers that derive their
network classifiers that derive their
weights during the learning cycle.
weights during the learning cycle.
0
t
w x >
The Perceptron Training Algorithm
The Perceptron Training Algorithm
•
•
The sample pattern vectors
The sample pattern vectors
x
x
1 , 1 ,
x
x
2 2
, ... ,
, ... ,
x
x
p p
,
,
called
called
the
the
training sequence,
training sequence,
are presented to the
are presented to the
machine along with the correct response.
machine along with the correct response.
•
•
The response is provided by the teacher and
The response is provided by the teacher and
specifies the classification information 'for
specifies the classification information 'for
each mput vector. The classifier modifies its
each mput vector. The classifier modifies its
parameters by means of iterative, supervised
parameters by means of iterative, supervised
learning.
learning.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
The network learns from 'experience
The network learns from 'experience
by comparing the targeted correct
by comparing the targeted correct
response with the actual response.
response with the actual response.
The classifier structure is usually
The classifier structure is usually
adjusted after each incorrect response
adjusted after each incorrect response
based on the error value generated.
based on the error value generated.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Let us now look again at the dichotomizer
Let us now look again at the dichotomizer
introduced and defined earlier.
introduced and defined earlier.
We will develop a supervised training procedure
We will develop a supervised training procedure
for this two
for this two


class linear classifier.
class linear classifier.
Assuming that the desired response is provided,
the error signal is computed.
The error information can be used to adapt the
weights of the discrete perceptron.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
First we examine the geometrical conditions in
the augmented weight space.
This will make it possible to devise a
meaningful training procedure for the
dichotomizer under consideration.
The decision surface equation in n+1
dimensional augmented pattern space is
0
t
w x =
The Perceptron Training Algorithm
The Perceptron Training Algorithm
When the above equation is considered in the
When the above equation is considered in the
pattern space then it is written for fixed weights
pattern space then it is written for fixed weights
w
w
(1)
(1)
, w(
, w(
2),
2),
………
………
.,
.,
w
w
(k)
(k)
.
.
Therefore the variables
Therefore the variables
of the function
of the function
f(w
f(w
t t
(
(
i
i
)x)
)x)
are x
are x
1 1
, x
, x
2 2
,
,
………
………
.,
.,
x
x
n+1 n+1
,
,
the components of the pattern vector.
the components of the pattern vector.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
x
x
1 1
x
x
2 2
w
w
(
(
i
i
)
)
f(w
f(w
t t
(
(
i
i
)
)
x)=
x)=
0
0
The normal vector
The normal vector
w
w
(
(
i
i
) (weight wector) points
) (weight wector) points
toward the side of the pattern space for which
toward the side of the pattern space for which
w
w
t t
(
(
i
i
)
)
x
x
> 0, called the positive side.
> 0, called the positive side.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
When the above equation is considered in the
When the above equation is considered in the
weight space then it is written for fixed patterns
weight space then it is written for fixed patterns
x
x
(1)
(1)
, x(
, x(
2),
2),
………
………
.,
.,
x
x
(p)
(p)
.
.
Therefore the variables of
Therefore the variables of
the function f
the function f
(
(
w
w
t t
x
x
(i))
(i))
are w
are w
1 1
, w
, w
2 2
,
,
………
………
.,
.,
w
w
n+1 n+1
, the
, the
components of the weight vector.
components of the weight vector.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
w
1 1
w
w
2 2
x
x
(
(
i
i
)
)
f(w
f(w
t t
x(i))=
x(i))=
0
0
The normal vector
The normal vector
x
x
(
(
i
i
) (pattern vector) points
) (pattern vector) points
toward the side of the weight space for which
toward the side of the weight space for which
w
w
t t
x(i)
x(i)
> 0, called the positive side.
> 0, called the positive side.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
1 2 1
1
1
1 2 1
2
2 1 2 1
1
1 2 1
1
( , ,....., )
( )
( , ,....., )
( )
( ) ( , ,....., )
( )
( , ,....., )
n
n
n
n
n
n
f w w w
w
x i
f w w w
x i
w i w w w i
x i
f w w w
w
+
+
+
+
+
+
c
(
(
c
(
(
c (
(
(
(
c = = = =
(
(
(
(
(
¸ ¸
c
(
( c
¸ ¸
( )
t
f w x f x( ) ∇
In further discussion it will be understood that the
In further discussion it will be understood that the
normal vector will always point toward the side of
normal vector will always point toward the side of
the space for which
the space for which
w
w
t t
x
x
> 0, called the positive side,
> 0, called the positive side,
or semispace, of the hyperplane.
or semispace, of the hyperplane.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Decision hyperplane in augmented weight space
Decision hyperplane in augmented weight space
for a five pattern set from two classes
for a five pattern set from two classes
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Note that the vectors
Note that the vectors
x(
x(
i
i
)
)
points toward the
points toward the
positive side of the decision hyperplanes
positive side of the decision hyperplanes
w
w
t t
x(i)
x(i)
= 0.
= 0.
By labeling each decision boundary in the
By labeling each decision boundary in the
augmented weight space with an arrow
augmented weight space with an arrow
pointing into the positive half
pointing into the positive half


plane, we can
plane, we can
easily find a region in the weight space that
easily find a region in the weight space that
satisfies the linearly separable classification.
satisfies the linearly separable classification.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
To find the solution for weights, we will
To find the solution for weights, we will
look for the intersection of the positive
look for the intersection of the positive
decision regions due to the prototypes
decision regions due to the prototypes
of class 1
of class 1
·
·
and of the negative decision
and of the negative decision
regions due to the prototypes of class 2.
regions due to the prototypes of class 2.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Inspection of the figure reveals that the
Inspection of the figure reveals that the
intersection of the sets of weights
intersection of the sets of weights
yielding all five correct classifications
yielding all five correct classifications
of depicted patterns is in the shaded
of depicted patterns is in the shaded
region of the second quadrant as shown
region of the second quadrant as shown
in the figure above.
in the figure above.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Let us now attempt to arrive iteratively
Let us now attempt to arrive iteratively
at the weight vector w located in the
at the weight vector w located in the
shaded weight solution area.
shaded weight solution area.
To accomplish this, the weights need
To accomplish this, the weights need
to be adjusted from the initial value
to be adjusted from the initial value
located anywhere in the weight space.
located anywhere in the weight space.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
This assumption is due to our ignorance of
This assumption is due to our ignorance of
the weight solution region as well as weight
the weight solution region as well as weight
initialization.
initialization.
The adjustment discussed, or network
The adjustment discussed, or network
training, is based on an error
training, is based on an error


correction
correction
scheme.
scheme.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
At this point we will introduce the
At this point we will introduce the
Perceptron
Perceptron
Learning (Traning) Rule (Algorithm).
Learning (Traning) Rule (Algorithm).
The perceptron learning rule is of central
The perceptron learning rule is of central
importance for
importance for
supervised learning
supervised learning
of neural
of neural
networks.
networks.
The
The
weights are initialized at any values
weights are initialized at any values
in this
in this
method
method
The Perceptron Training Algorithm
The Perceptron Training Algorithm
A neuron is considered to be an adaptive
A neuron is considered to be an adaptive
element. Its weights are modifiable
element. Its weights are modifiable
depending on the input signal it receives,
depending on the input signal it receives,
its output value, and the associated
its output value, and the associated
teacher (supervisor) response.
teacher (supervisor) response.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
( 1) ( ) ( ) w i w i w i + = + A
and
and
•
•
d(i) is the teacher
d(i) is the teacher
’
’
s (supervisor
s (supervisor
’
’
s) signal
s) signal
•
•
r is the learning signal
r is the learning signal
•
•
c is a positive number called the learning
c is a positive number called the learning
constant depending on the sign of r.
constant depending on the sign of r.
The weight vector is changed according to the
The weight vector is changed according to the
following:
following:
( )
( ) ( ), ( ), ( ) ( ) w i cr w i x i d i x i A =
where
where
The Perceptron Training Algorithm
The Perceptron Training Algorithm
1
1
2
2
1
1
(
( )
( )
( )
( )
(
n
n
f i
w
x i
i
x i
i i w
x i
i
w
+
+
( c
(
c
(
(
(
c
(
(
(
= = c
(
(
(
(
(
¸ ¸
c (
(
c
¸ ¸
t
t
t
t
w x( ))
w x( )
(w x( )) = x( )
w x( ))
∇
This reveals that the change in the weight
This reveals that the change in the weight
vector is in the direction of steepest ascent
vector is in the direction of steepest ascent
(or descent)of
(or descent)of
w
w
t t
x(i).
x(i).
Here we have used
Here we have used
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Perceptron Learning (Traning) Rule (Algorithm).
Perceptron Learning (Traning) Rule (Algorithm).
In this case the learning signal is defined as:
In this case the learning signal is defined as:
( ) ( ) ( ) r i d i y i = ÷
where d(i) is the desired output signal and y(i) is
where d(i) is the desired output signal and y(i) is
the actual output signal for the input pattern x(i)
the actual output signal for the input pattern x(i)
given by:
given by:
( ) sgn( y i (i) (i) =
t
w x )
The weight adjustment is given by:
The weight adjustment is given by:
( ) [ ( ) sgn( ( ) ( )) ( ) w i c d i i i A = ÷ ]
t
w x x i
The Perceptron Training Algorithm
The Perceptron Training Algorithm
1) =sgn( ) 1,i.e., the input is misclassifie
d 1 ( 1) 2;
the correction is in the direction of steepest ascent and given as
t
y w x r d y = ÷ ¬ = ÷ = ÷ ÷ = +
d =1, i.e.,class 1 is input :
( ) 2 ( )
2) =sgn( ) 1,i.e., the input is correctly classified 1 1 0; no correction
1) sgn( ) 1,i.e., the input is co
w i c x i
t
y w x r d y
t
y w x
A =
= ¬ = ÷ = ÷ =
= = ÷
d =1, i.e.,class 2 is input :
rrectly classified 1 ( 1) 0; no correction
2) sgn( ) 1,i.e., the input is misclassified 1 (1) 2;
the correction is in the direction of steepest decent and given as ( ) 2 ( )
r d y
t
y w x r d y
w i c x i
¬ = ÷ = ÷ ÷ ÷ =
= = ¬ = ÷ = ÷ ÷ = ÷
A = ÷
The Perceptron Training Algorithm
The Perceptron Training Algorithm
EXAMPLE:
EXAMPLE:
The trained classifier should provide the
The trained classifier should provide the
following classification of four patterns x
following classification of four patterns x
with known class membership d:
with known class membership d:
x(1)
x(1)
= 1,
= 1,
x(3)
x(3)
=3,
=3,
d1
d1
=
=
d3
d3
= 1: class C1
= 1: class C1
x(2)
x(2)
=
=


0.5,
0.5,
x(4)
x(4)
=
=


2,
2,
d2
d2
=
=
d4
d4
=
=


1: class C2
1: class C2
The Perceptron Training Algorithm
The Perceptron Training Algorithm
u
The augmented input vectors are given as:
The augmented input vectors are given as:
1 0.5 3 2
(1) , (2) , (3) , (4)
1 1 1 1
x x x x
÷ ÷
( ( ( (
= = = =
( ( ( (
¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸
x(1)
x(2)
x(3)
x(4)
The Perceptron Training Algorithm
The Perceptron Training Algorithm
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
 
1
(1) (1) 2.5 1.75 0.75 0
1
t
w x
(
= ÷ = ÷ <
(
¸ ¸
Let us choose an arbitrary augmented weight vector of
Let us choose an arbitrary augmented weight vector of
With x (1) being the input
,
we obtain
sgn( (1) (1)) 1
t
w x = ÷
and
binary activation function (discrete
binary activation function (discrete
perceptron)
perceptron)
Hence x(1) is classified as being in class C
2
. However
this is not true. Therefore a correction has to be made.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
t
w x
The question to be asked at this point is: How do we make
this correction? The answer depends on which training
algorithm used. Since sgn{w
t
(1)x(1)}=1, one thing is
certain,however, is that the correction should me made in
such a way that
increases. In order that we achieve this we must first find
out if there is a direction in which the decrease or, for that
matter, increase takes place. To show this, let us consider
the surface given by :
The Perceptron Training Algorithm
The Perceptron Training Algorithm
1 2 1
( , ,....., )
n
z f w w w
+
=
1 2 1 1 2 1 1 2 1
1 2 1 1 2 1
1 2 1
( , ,....., ) ( , ,....., ) ( , ,....., )
( , ,....., ) .....
n n n
n n
n
f w w w f w w w f w w w
df w w w dw dw dw
w w w
+ + +
+ +
+
c c c
= + + +
c c c
We can write:
We can write:
Let us now restrict ourselves to the case of
Let us now restrict ourselves to the case of
3 dimensions, namely,
3 dimensions, namely,
1 2
, , z w w or more
or more
succinctly
succinctly
, , z x y
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
( , ) z f x y =
Now consider the surface
If the level curves are interpreted as contour lines
If the level curves are interpreted as contour lines
of the landscape, i.e., of the surface, then along
of the landscape, i.e., of the surface, then along
these curves
these curves
( , ) constant z f x y = =
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
( , ) 0 dz df x y = =
consequently, we obtain
consequently, we obtain
0 =
c
c
+
c
c
= dy
y
) y , x ( f
dx
x
) y , x ( f
) y , x ( df
hence
hence
where
where
dx
dx
and
and
dy
dy
are the increments given to
are the increments given to
x
x
and
and
y
y
on the level curve.
on the level curve.
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V
dy
dx
dr
y
) y , x ( f
x
) y , x ( f
) y , x ( f and
dr f and V
where
Now defining
Now defining
are known to be the gradient vector and the tangent
vector,respectively,
The Perceptron Training Algorithm
The Perceptron Training Algorithm
0 = V = dr f ) y , x ( df
t
This means that the gradient vector and the
tangent vector are orthogonal vectors. Moreover
it can be shown that the gradient vector points in
the direction of steepest ascent of the function
f(x,y). Furthermore, the gradient is the rate of
climb in the direction of steepest ascent.
we can write
The Perceptron Training Algorithm
The Perceptron Training Algorithm
2 2 2
( , ) ( 50) ( 50) 32 z f x y x y = = ÷ + ÷ ÷
Now consider the surface
The following MATLAB program plots this
The following MATLAB program plots this
surface:
surface:
The Perceptron Training Algorithm
The Perceptron Training Algorithm
u
close all
clear all
for x=1:1:100;
for y=1:1:100;
f(x,y)=(x50).^2+(y50).^21024;
end
end
mesh(f);title('f(x,y)=x^2+y^21024');
figure,imshow(f,[ ],'notruesize');colormap(jet);
colorbar;title('f(x,y)=x^2+y^21024');
The Perceptron Training Algorithm
The Perceptron Training Algorithm
0
20
40
60
80
100
0
50
100
2000
1000
0
1000
2000
3000
4000
z=f(x,y)=(x50)
2
+(y50)
2
32
2
0
20
40
60
80
100
0
50
100
2000
1000
0
1000
2000
3000
4000
z=f(x,y)=(x50)
2
+(y50)
2
32
2
z=f(x,y)=(x50)
2
+(y50)
2
32
2
1000
500
0
500
1000
1500
2000
2500
3000
3500
The Perceptron Training Algorithm
The Perceptron Training Algorithm
2 2 2
i
( , ) ( 50) ( 50) 32 C z f x y x y = = ÷ + ÷ ÷ =
The level curves are obtained from
x
y
where C
where C
i i
are constants.
are constants.
4096
1
= C
2
9216 C =
3
16384 C =
The Perceptron Training Algorithm
The Perceptron Training Algorithm
( , )
2( 50)
( , ) and
( , )
2( 50)
f x y
x dx
x
f x y dr
f x y
y dy
y
c
(
(
÷
( ( c
( V = = =
( (
c
÷
(
¸ ¸ ¸ ¸
(
c
¸ ¸
Considering the four quadrants of the circle:
Considering the four quadrants of the circle:
0 ) 50 ( 2 0 ) 50 ( 2
0 ) 50 ( 2 4 . 0 ) 50 ( 2 3 .
0 ) 50 ( 2 0 ) 50 ( 2
0 ) 50 ( 2 2 . 0 ) 50 ( 2 1 .
< ÷ < ÷
> ÷ ¬ > ÷ ¬
> ÷ > ÷
< ÷ ¬ > ÷ ¬
y y
x Q x Q
y y
x Q x Q
The Perceptron Training Algorithm
The Perceptron Training Algorithm
the gradient vector points in directions as given
the gradient vector points in directions as given
below:
below:
Q.1
Q.1
Q.2
Q.2
Q.3
Q.3
Q.4
Q.4
The Perceptron Training Algorithm
The Perceptron Training Algorithm
The fact that the gradient vector is orthogonal to
the tangent vector proves that it is in the direction
of steepest ascent or steepest descent.
The directions found for the example show that
the gradient vector points in the direction of ascent
of the function f(x,y).
Combining the two facts we can conclude that it
points in the direction of steepest ascent.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
(
¸
(
¸
÷
=
(
¸
(
¸
=
(
¸
(
¸
=
(
¸
(
¸
=
(
¸
(
¸
÷
=
(
¸
(
¸
=
(
¸
(
¸
=
(
¸
(
¸
=
1
2
4
4
4
1
3
3
3
3
1
5 0
2
2
2
1
1
1
1
1
2
1
2
1
2
1
2
1
) ( x
) ( x
) ( x ,
) ( x
) ( x
) ( x ,
.
) ( x
) ( x
) ( x ,
) ( x
) ( x
) ( x
In the weight space the following straight lines
represent the decision lines:
 
1 2 2 1 2 1
0
1
1
w w w w w w ÷ = ¬ = + =
(
¸
(
¸
 
1 2 2 1 2 1
5 0 0 5 0
1
5 0
w . w w . w
.
w w = ¬ = ÷ =
(
¸
(
¸
÷
 
1 2 2 1 2 1
3 0 3
1
3
w w w w w w ÷ = ¬ = + =
(
¸
(
¸
 
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = ¬ = ÷ =
(
¸
(
¸
÷
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Initial weight vector
2
4
3
1
x(1)
x(2)
x(3)
x(4)
Decision lines in
weight space
The corresponding gradient vectors
are computed as follows:
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
The Perceptron Training Algorithm
The Perceptron Training Algorithm
(
¸
(
¸
= =
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V ¬ + =
1
1
1
1
1
1
1
1 1 1
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w ) ( x w ) ( x for
t
t
t t
(
¸
(
¸
÷
= =
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V ¬ + ÷ =
1
5 0
2
2
2
2
2
2 5 0 1 2
1
1
2
1
2 1
.
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w . ) ( x w ) ( x for
t
t
t t
(
¸
(
¸
= =
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V ¬ + ÷ =
1
3
3
3
3
3
3
3 5 0 3 3
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w . ) ( x w ) ( x for
t
t
t t
(
¸
(
¸
÷
= =
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V ¬ + ÷ =
1
2
4
4
4
4
4
4 2 4 4
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w ) ( x w ) ( x for
t
t
t t
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
x(1)
Decision lines and
gradient vectors in
weight space
w
t
x(4)<0
x(2)
x(3)
w
t
x(4)>0
w
t
x(3)<0 w
t
x(3)>0
x(4)
w
t
x(1)<0
w
t
x(1)>0
w
t
x(2)<0
w
t
x(2)>0
Initial weight vector
2
4
3
1
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Now we can concentrate on the particular training
(or learning) algorithm (or rule).
The Perceptron Training Algorithm
The Perceptron Training Algorithm
This is a supervised learning algorithm. This
This is a supervised learning algorithm. This
means that at each step the correction is made
means that at each step the correction is made
according to the directive given by the supervisor
according to the directive given by the supervisor
as shown in the following figure.
as shown in the following figure.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
x
y
i
d
i
Weight learning rule: d
i
is provided only in the
case of supervised learning
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Now consider
) x w sgn( d r
t
i
÷ =
Since
1 1 ± = ± = ) x w sgn( , d
t
i
r can take on one of the three values:
0 2 2 , , ÷ +
The Perceptron Training Algorithm
The Perceptron Training Algorithm
In fact
0 1 1
0 1 1
2 1 1
2 1 1
= ¬ ÷ = ÷ =
= ¬ + = + =
÷ = ¬ + = ÷ =
+ = ¬ ÷ = + =
r ) x w sgn( , d and
r ) x w sgn( , d for
r ) x w sgn( , d for
r ) x w sgn( , d for
t
i
t
i
t
i
t
i
Since
Therefore we can define the correction rule in terms
of the correction amount at the nth step as follows:
)) n ( x ) n ( w ( ))) n ( x ) n ( w sgn( ) n ( d )( n ( ) n ( w Δ
t t
i i
V ÷ =q
The Perceptron Training Algorithm
The Perceptron Training Algorithm
) n ( x ))) n ( x ) n ( w sgn( ) n ( d )( n ( ) n ( w Δ
t
i i
÷ =q
y
i
d
i
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
0 4
0 3
0 2
0 1
<
>
<
>
) ( x ) N ( w
) ( x ) N ( w
) ( x ) N ( w
) ( x ) N ( w
t
t
t
t
In order for the correct cllasification of the entire
training set
1 4 1 3 1 2 1 1 ÷ = = ÷ = = ) ( d and ; ) ( d , ) ( d , ) ( d
the following four inequalities must hold:
where w(N) is the final weight vector that provides
correct classification for the entire training set.
) ( x and ), ( x ), ( x ), ( x 4 3 2 1
with respective class memberships
This means that after N
This means that after N


1 training steps
1 training steps
the weight vector w(N) ends up in the
the weight vector w(N) ends up in the
solution area, which is the shaded area
solution area, which is the shaded area
in the following figure.
in the following figure.
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
x(1)
Weight Space
w
t
x(4)<0
x(2)
w
t
x(4)>0
w
t
x(3)<0 w
t
x(3)>0
x(4)
w
t
x(1)<0
w
t
x(1)>0
Initial weight vector
x(3)
1
3
4
2
w
t
x(2)>0
w
t
x(2)<0
The training has so far been shown in the weight
The training has so far been shown in the weight
space. This is achieved using the decision lines
space. This is achieved using the decision lines
defined by x(1), x(2), x(3) nd x(4). However, the
defined by x(1), x(2), x(3) nd x(4). However, the
original decision lines determined by the
original decision lines determined by the
perceptron at each step are defined in the pattern
perceptron at each step are defined in the pattern
space as this enables the classification to be easily
space as this enables the classification to be easily
seen. These decision lines are defined by w(1), w(2)
seen. These decision lines are defined by w(1), w(2)
w(3) and w(4).
w(3) and w(4).
In the following we show the correction steps of the
In the following we show the correction steps of the
weight vector as well as the corresponding decision
weight vector as well as the corresponding decision
surfaces in the pattern space.
surfaces in the pattern space.
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
In the pattern space
In the pattern space
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
0 1 = x ) ( w
determines the the decision line defined by
the initial weight vector
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
 
1
1 2 2 1
2
(1) 2.5 1.75 2.5 1.75 0 1.429
t
x
w x x x x x
x
(
= ÷ = ÷ + = ¬ =
(
¸ ¸
as
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
1 2
(1) 2.5 1.75 0
t
w x x x = ÷ + >
(
¸
(
¸
÷
= =
(
¸
(
¸
=
(
(
(
(
¸
(
¸
c
c
c
c
= V ¬ + ÷ =
75 1
5 2
1
1
1
1
1
1 5 2 1
2
1
2
1
2 1
.
.
) ( w
) ( w
) ( w
x
) x ) ( w (
x
) x ) ( w (
x ) ( w ( x x . x ) ( w
t
t
t t
The corresponding gradient vector is computed
as follows:
which is the initial weight vector. As the
gradient vector lies on the side of
1 2
(1) 2.5 1.75 0
t
w x x x = ÷ + =
where
However, x(1) and x(3) have class 1 ,i.e.,
However, x(1) and x(3) have class 1 ,i.e.,
d1= d3=1 and x(2) and x(4) have class 2 ,i.e.,
d1= d3=1 and x(2) and x(4) have class 2 ,i.e.,
d2= d4=
d2= d4=


1.
1.
This means that x(1), x(2), x(3), x(4) all are
This means that x(1), x(2), x(3), x(4) all are
wrongly classified.
wrongly classified.
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Weight Space Pattern Space
Initial
weight
vector
(
¸
(
¸
÷
=
75 1
5 2
1
.
.
) ( w
Initial
decision
line
Weight vector is orthogonal to corresponding decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
Weight Space
Initial weight vector
Pattern Space
x
2
x
1
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
(is orthogonal todecion line)
Decision line for initial weight vector
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Weight Space Pattern Space
Initial
weight
vector
(
¸
(
¸
÷
=
75 1
5 2
1
.
.
) ( w
Initial
decision
line
Weight vector is orthogonal to corresponding decision line
Pattern x(1) is input
 
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
÷ = ¬ = + =
(
¸
(
¸
=
(
¸
(
¸
=
1
1
1) ( x
Decision line is orthogonal to corresponding input vector
First
input
vector
Initial
decision
line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
 
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
÷ = ¬ = + =
(
¸
(
¸
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
Initial weight vector
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
Step 1:Pattern x(1) is input
1
Line 1 is
decision line
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
2.5 1 1.5
(2) (1) (1)
1.75 1 2.75
w w x
÷ ÷
( ( (
= + = + =
( ( (
¸ ¸ ¸ ¸ ¸ ¸
  1
1
1
75 1 5 2 1 1 1 ÷ =
(
¸
(
¸
÷ = = ) . . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 1 1 = ÷ ÷ = ÷ ) ( ) ( y ) ( d
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Weight Space Pattern Space
Initial weight
vector
(
¸
(
¸
÷
=
75 1
5 2
1
.
.
) ( w
First input
vector
(
¸
(
¸
=
1
1
1) ( x
¬
Step 1 (Update 1): Pattern x(1) is input
Initial decision line
Updated weight vector
 
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Updated decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
 
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
÷ = ¬ = + =
(
¸
(
¸
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
Initial weight vector
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 2 > x ) ( w
t
0 2 < x ) ( w
t
Step 1 (Update 1): :Pattern x(1) is input
(
¸
(
¸
÷
=
75 2
5 1
2
.
.
) ( w
) ( w 2
 
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
2
Line 2 is
decision line
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
Step 2 (Update 2): :Pattern x(2) is input
1.5 0.5 1
(3) (2) (2)
2.75 1 1.75
w w x
÷ ÷ ÷
( ( (
= ÷ = ÷ =
( ( (
¸ ¸ ¸ ¸ ¸ ¸
  1
1
5 0
75 2 5 1 2 2 2 =
(
¸
(
¸
÷
÷ = = )
.
. . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 2 2 ÷ = ÷ ÷ = ÷ ) ) ( y ) ( d
 
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Weight Space Pattern Space
¬
(
¸
(
¸
÷
=
75 2
5 1
2
.
.
) ( w
(
¸
(
¸
÷
=
1
5 0
2
.
) ( x
Weight
vector to be
updated
Second
input
vector
Decision line to be updated
 
1 2 2 1
2
1
57 0 0 75 1 75 1 1 3 x . x x . x
x
x
. x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
¹
¹
Second update
Updated decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(2)<0
w
t
x(2)>0
Initial weight vector
0 2 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 3 > x ) ( w
t
0 3 < x ) ( w
t
Step 2 (Update 2): :Pattern x(2) is input
) ( w 3
x(2)
 
1 2 2 1 2 1
5 0 0 5 0
1
5 0
w . w w . w
.
w w = ¬ = ÷ =
(
¸
(
¸
÷
3
Line 3 is
decision line
) ( w 3
1 3 2
(4) (3) (3)
1.75 1 2.75
w w x
÷
( ( (
= + = + =
( ( (
¸ ¸ ¸ ¸ ¸ ¸
Step 3:Pattern x(3) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
  1
1
3
75 1 1 3 3 3 ÷ =
(
¸
(
¸
÷ = = ) . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 3 3 = ÷ ÷ = ÷ ) ( ) ( y ) ( d
Step 3 (Update 3): :Pattern x(3) is input
Weight Space Pattern Space
Weight
vector to be
updated
Third
input
vector
Decision line to be updated
(
¸
(
¸
÷
=
75 1
1
3
.
) ( w
(
¸
(
¸
=
1
3
3) ( x
 
1 2 2 1
2
1
57 0 0 75 1 75 1 1 3 x . x x . x
x
x
. x ) ( w
t
= ¬ = + ÷ =
(
¸
(
¸
÷ =
Updated decision line
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
Initial weight vector
0 3 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
Step 3 (Update 3): Pattern x(3) is input
 
1 2 2 1 2 1
3 0 3
1
3
w w w w w w ÷ = ¬ = + =
(
¸
(
¸
4
w(4)
x(3)
Line 4 is
decision line
2
(5) (4)
2.75
w w
(
= =
(
¸ ¸
Step 4:Pattern x(4) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
  1
1
2
75 2 2 4 4 4 ÷ =
(
¸
(
¸
÷
= = ) . sgn( )) ( x ) ( w sgn( ) ( y
t
0 1 1 4 4 = ÷ ÷ ÷ = ÷ ) ( ) ( y ) ( d
Step 4 (Update 4): :Pattern x(4) is input
Weight Space Pattern Space
Weight
vector to be
updated
Fourth
input
vector
Decision line to be updated
(
¸
(
¸
÷
=
1
2
4 ) ( x
(
¸
(
¸
=
75 2
2
4
.
) ( w
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
No update,
same decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(4)>0
w
t
x(4)<0
Initial weight vector
0 4 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
Step 4 (Update 4): Pattern x(4) is input
 
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = ¬ = + ÷ =
(
¸
(
¸
÷
5
w(5) =w(4)
x(4)
Line 4 remains
decision line
=4
2
(6) (5) (4)
2.75
w w w
(
= = =
(
¸ ¸
Step 5:Pattern x(1) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
0 1 1 5 1 = ÷ = ÷ ) ( y ) ( d
  1
1
1
75 2 2 1 4 1 5 5 =
(
¸
(
¸
= = = ) . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y
t t
Step 5 (Update5): :Pattern x(1) is input
Weight
vector to be
updated
First
input
vector
Decision line to be updated
(
¸
(
¸
=
1
1
1) ( x
(
¸
(
¸
= =
75 2
2
4 5
.
) ( w ) ( w
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
No update,
same decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(1)>0
w
t
x(1)<0
Initial weight vector
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
 
1 2 2 1 2 1
0
1
1
w w w w w w ÷ = ¬ = + =
(
¸
(
¸
6
w(6) =w(5)= w(4)
x(1)
Line 4 remains
decision line
Step 5 (Update5): :Pattern x(1) is input
5
4
 
0.5
(6) sgn( (6) (2)) sgn( (6) (2)) sgn( 2 2.75 ) 1
1
t t
y w x w x
÷
(
= = = =
(
¸ ¸
Step 6:Pattern x(2) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
(
¸
(
¸
=
(
¸
(
¸
÷
÷
(
¸
(
¸
= ÷ =
75 1
5 2
1
5 0
75 2
2
2 4 7
.
. .
.
) ( x ) ( w ) ( w
2 1 1 6 2 ÷ = ÷ ÷ = ÷ ) ) ( y ) ( d
Step 6 (Update 6): :Pattern x(2) is input
Weight
vector to be
updated
Second
input
vector
Decision line to be updated
(
¸
(
¸
÷
=
1
5 0
2
.
) ( x
(
¸
(
¸
= = =
75 2
2
4 5 6
.
) ( w ) ( w ) ( w
 
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
 
1 2 2 1
2
1
43 1 0 75 1 5 2 75 1 5 2 7 x . x x . x .
x
x
. . x ) ( w
t
÷ = ¬ = + =
(
¸
(
¸
=
Updated decision line
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
Initial weight vector
0 2 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 7 > x ) ( w
t
0 7 < x ) ( w
t
Step 6 (Update 6): :Pattern x(2) is input
 
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = ¬ = + ÷ =
(
¸
(
¸
÷
7
w(7)
w(7)
x(3)
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
Initial weight vector
0 3 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 3 > x ) ( w
t
0 3 < x ) ( w
t
Step 7 (Update 7): :Pattern x(3) is input
x(2)
 
1 2 2 1 2 1
3 0 3
1
3
w w w w w w ÷ = ¬ = + =
(
¸
(
¸
3
w(4)
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
2.5
(8) (7)
1.75
w w
(
= =
(
¸ ¸
 
3
(7) sgn( (7) (3)) sgn( 2.5 1.75 ) 1
1
t
y w x
(
= = =
(
¸ ¸
Step 7:Pattern x
3
is input
0 1 1 7 3 = ÷ = ÷ ) ( y ) ( d
2.5
(9) (8) (7)
1.75
w w w
(
= = =
(
¸ ¸
Step 8:Pattern x
4
is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
0 1 1 7 4 = ÷ ÷ ÷ = ÷ ) ( ) ( y ) ( d
  1
1
2
75 1 5 2 4 7 4 8 8 ÷ =
(
¸
(
¸
÷
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y
t t
2.5
(10) (9) (8) (7)
1.75
w w w w
(
= = = =
(
¸ ¸
Step 9:Pattern x(1) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
  1
1
1
75 1 5 2 1 7 1 9 9 =
(
¸
(
¸
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y
t t
0 1 1 9 1 = ÷ = ÷ ) ( y ) ( d
2.5
(11)
1.75
w
(
=
(
¸ ¸
Step 10:Pattern x(2) is input
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
  1
1
1
75 1 5 2 1 7 2 10 10 =
(
¸
(
¸
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y
t t
2 1 1 10 2 ÷ = ÷ ÷ = ÷ ) ( y ) ( d
The Perceptron Training
The Perceptron Training
Algorithm
Algorithm
The Perceptron Training Algorithm
The Perceptron Training Algorithm
w
2
w
1
2.5
(1)
1.75
w
÷
(
=
(
¸ ¸
 
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
÷ = ¬ = + =
(
¸
(
¸
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
Initial weight vector
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
 
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
÷ = ¬ = + ÷ =
(
¸
(
¸
÷ =
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
The Perceptron Training Algorithm
The Perceptron Training Algorithm
The initial weight vector w(1) and the weight
The initial weight vector w(1) and the weight
vectors w(2)
vectors w(2)


w(11) obtained during the training
w(11) obtained during the training
algorithm are given below:
algorithm are given below:
2.5 1.5 1 2
(1) , (2) , (3) , (4) ,
1.75 2.75 1.75 2.75
2.5
(5) (4), (6) (5) (4), (7) ,
1.75
3
(8) (7), (9) (8) (7), (10) (9) (8) (7), (11)
0.75
w w w w
w w w w w w
w w w w w w w w w w
÷ ÷ ÷
( ( ( (
= = = =
( ( ( (
¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸
(
= = = =
(
¸ ¸
(
= = = = = = =
(
¸ ¸
As can be seen from these vectors, out of the ten
As can be seen from these vectors, out of the ten
vectors w(2)
vectors w(2)


w(11) only five are different .
w(11) only five are different .
The Perceptron Training Algorithm
The Perceptron Training Algorithm
8 6 4 2 0 2 4 6 8 10 12
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
These five vectors are given in the MATLAB plot
These five vectors are given in the MATLAB plot
below:
below:
The Perceptron Training Algorithm
The Perceptron Training Algorithm
x
x
3 3
x
x
1 1
x
x
2 2
0
0
x
x
3 3
=0.25 plane =0.25 plane
(0,0,0)
(0,0,0)
(1,1,0)
(1,1,0)
(0,1,0)
(0,1,0)
(1,0,0)
(1,0,0)
(1,0,1)
(1,0,1)
(0,0,1)
(0,0,1)
(0,1,1)
(0,1,1)
(0,0,1)
(0,0,1)
Example: The trained
Example: The trained
classifier is required
classifier is required
to provide the
to provide the
classification such
classification such
that the yellow vertices
that the yellow vertices
of the cube have class
of the cube have class
membership d=1 and
membership d=1 and
the blue vertices have
the blue vertices have
class membership
class membership
d=2.
d=2.
The Perceptron Training Algorithm
The Perceptron Training Algorithm
The Perceptron Training Algorithm
The Perceptron Training Algorithm
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
Given are the p training pairs
Given are the p training pairs
{x
{x
1 1
,
,
d
d
1, 1,
x
x
2 2
,
,
d
d
2 2
,
,
……………
……………
.,
.,
x
x
p, p,
d
d
p p
}, where x
}, where x
i i
is (N
is (N
+
+
1)
1)
x
x
1
1
D
D
i i
is 1 x 1, i=1,2, ,P. In the following n denotes
is 1 x 1, i=1,2, ,P. In the following n denotes
the training step and p denotes the step counter
the training step and p denotes the step counter
within the training cycle.
within the training cycle.
Step 1:
Step 1:
c>0 is chosen.
c>0 is chosen.
Step2:
Step2:
Weights are initialized at w at random small
Weights are initialized at w at random small
values, w is (N
values, w is (N
+
+
1)
1)
x
x
1. Counters and error are
1. Counters and error are
initialized.
initialized.
1 ,1 and 0 k p E ÷ ÷ ÷
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Step3:
Step3:
The training cycle begins here. Input is
The training cycle begins here. Input is
presented and output is computed:
presented and output is computed:
, , sgn( )
t
p p
x x d d y w x ÷ ÷ =
Step4:
Step4:
Weights are updated:
Weights are updated:
1
, ( )
2
p
c d y ÷ + ÷ x x w w x →
Step5:
Step5:
Cycle error is computed.
Cycle error is computed.
2
1
( )
2
E d y ÷ ÷
The Perceptron Training Algorithm
The Perceptron Training Algorithm
Step 6:
Step 6:
If p<P then
If p<P then 1, 1 p p n n ÷ + ÷ +
and go to Step 3, otherwise go to Step 7.
and go to Step 3, otherwise go to Step 7.
Step 7:
Step 7:
The training cycle is completed. For E=0,
The training cycle is completed. For E=0,
terminate the training session. Output weights, k
terminate the training session. Output weights, k
and E. If E>0 then enter the new training cycle
and E. If E>0 then enter the new training cycle
by going to Step 3.
by going to Step 3.
Here the activation function is a continuous function of
Here the activation function is a continuous function of
the weights instead of the signum.
the weights instead of the signum.
There are two main objectives of this:
There are two main objectives of this:
1 1
To define a continuous function of the weights as the
To define a continuous function of the weights as the
error function so as to obtain finer control over the
error function so as to obtain finer control over the
weights as well as over the whole training procedure;
weights as well as over the whole training procedure;
2 2
To enable the computation of the error gradient in
To enable the computation of the error gradient in
order to be continuously in a position to know the
order to be continuously in a position to know the
direction in which the error decreases.
direction in which the error decreases.
Single
Single


Layer Continuous
Layer Continuous
Perceptron
Perceptron
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
The Delta Training Rule is based on the
The Delta Training Rule is based on the
minimisation of the error function which is given by
minimisation of the error function which is given by
2
2
1
)) n ( y ) n ( d ( ) n ( E ÷ =
where n is a positive integer representing the traning
where n is a positive integer representing the traning
step number, i.e.,the
step number, i.e.,the step number in the minimisation
process,
d(n) is the desired output signal and
d(n) is the desired output signal and
( ) [ ( ) ( )]
t
y n f w n x n =
is the actual output.
is the actual output.
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
The error function (error surgface) is a function
The error function (error surgface) is a function
of the weights: E(w
of the weights: E(w
1 1
,w
,w
2 2
,....,w
,....,w
p p
)=E(w) which is
)=E(w) which is
minimised using an iterative minimisation
minimised using an iterative minimisation
method which computes the new values of the
method which computes the new values of the
weights according to
weights according to
( 1) ( ) ( ) w n w n w n + = + A
where
where
A
A
w(n) is the increment given to the
w(n) is the increment given to the
present weight vector w(n) to obtain the new
present weight vector w(n) to obtain the new
weight vector w(n+1).
weight vector w(n+1).
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
q where E(w(n)) is the gradient vector and is called the
learning constant. Using this in the equation above, we obtain
∇
Let us now use the steepest descent method for the
Let us now use the steepest descent method for the
minimisation of the error function E(w) where it is
minimisation of the error function E(w) where it is
required that the weight changes be in the negative
required that the weight changes be in the negative
gradient direction. Therefore we take:
gradient direction. Therefore we take:
( ) ( ( )) w n E w n q A = ÷ V
)) n ( w ( E ) n ( w ) n ( w V ÷ = + q 1
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
) n ( E )) n ( w ( E =
is the error surface at the n’th training step .
The independent variables for minimisation at
each training step are w
i
, the components of the
weight vector.
Therefore the error to be minimised is:
2
2
1
)) n ( x ) n ( w ( f ) n ( d ( ) n ( E
t
÷ =
The error minimisation requires the computation
of the gradient of the error function:
 
) n ( w w
t
) n ( w w
)) n ( x w ( f ) n ( d ( ) w ( E
=
=
(
¸
(
¸
÷ V = V
2
2
1
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
The gradient vector is defined as:
(
(
(
(
(
(
(
(
(
¸
(
¸
c
c
c
c
c
c
= V
+1
2
1
p
w
E
.
.
w
E
w
E
) w ( E
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Using
Using
 
) n ( w w
t
) n ( w w
)) n ( x w ( f ) n ( d ( ) w ( E
=
=
(
¸
(
¸
÷ V = V
2
2
1
we obtain
and defining
x w ) w ( v
t
=
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
   
) n ( w w
p
) n ( w w
w
) w ( v
.
.
w
) w ( v
w
) w ( v
dv
)) w ( v ( df
)) w ( v ( f ) n ( d ) w ( E
=
+
=
¦
¦
¦
¦
)
¦
¦
¦
¦
`
¹
¦
¦
¦
¦
¹
¦
¦
¦
¦
´
¦
(
(
(
(
(
(
(
(
(
¸
(
¸
c
c
c
c
c
c
÷ ÷ = V
1
2
1
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
i
i
x
w
w v
=
c
c ) (
Since
and
y ) v ( f =
we can write
and
   
( )
( )
( ( ))
( ) ( ) ( ) ( )
w w n
w w n
df v w
E w d n y n x n
dv
=
=
(
V = ÷ ÷
(
¸
 
( ) ( )
( ) ( ( ))
( ) ( ) ( )
i
w w n
i
w w n
E w df v w
d n y n x n
w dv
= =
(
c
¦ ¹
= ÷ ÷
´ `
(
c
¹ )
¸ ¸
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
If bipolar continuous activation function is used
then we have:
1
( )
1
v
v
e
f v
e
÷
÷
÷
=
+
and
2
( ) 2
(1 )
v
v
df v e
dv e
÷
÷
=
+
In fact
( )
2
2
2
( ) 2 1 1 1
1 1 ( )
2 1 2
1
v v
v
v
df v e e
f v
dv e
e
÷ ÷
÷
÷
 
 
÷
(  = = ÷ = ÷

¸ ¸

+
\ . +
\ .
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
     ) n ( x ) n ( y ) n ( y ) n ( d ) w ( E
) n ( w w
2
1
2
1
÷ ÷ ÷ = V
=
Conclusion: The delta training rule for the
bipolar continuous perceptron is given as:
   ) n ( x ) n ( y ) n ( y ) n ( d ) n ( w ) n ( w
2
1
2
1
1 ÷ ÷ + = + q
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
If unipolar continuous activation function is used
then we have:
1
( )
1
v
f v
e
÷
=
+
and
2
( )
(1 )
v
v
df v e
dv e
÷
÷
=
+
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
( )
2
( ) 1 1 1 1 1
(1 )
1 1 1 1
1
= ( )(1 ( ))
v v
v v v v
v
df v e e
dv e e e e
e
f v f v
÷ ÷
÷ ÷ ÷ ÷
÷
+ ÷
= = = ÷
+ + + +
+
÷
we can write
we can write
Example:We will carry out the same
training algorithm as in the previous
example but this time using a continuous
bipolar perceptron.
2
2
1
1
2
2
1
1
1
2
1
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ =
)
`
¹
¹
´
¦
+
÷
÷ =
÷ ÷
÷
) n ( v ) n ( v
) n ( v
e
) n ( d
e
e
) n ( d ) n ( E
The error at step n is given by:
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
For the first pattern x(1)=[1 1]
t
, d(1)=1.
1 2
1 2
1 2 1 2 1 2
2
( )
2
2
( )
( ) ( ) ( ) 2
1 2
(1) (1) 1
2 1
1 2 1 2 2
1 1
2 1 2 1 (1 )
w w
w w
w w w w w w
E d
e
e
e e e
÷ +
÷ +
÷ + ÷ + +
¦ ¹
(
= ÷ ÷ =
´ `
(
+
¸ ¸
¹ )
¦ ¹
¦ ¹
(
÷ ÷ = =
´ ` ´ `
(
+ + +
¸ ¸
¹ )
¹ )
The error at step 1 is given by:
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
For the second pattern x(2)=[0.5 1]
t
,
d(2)=1.
 
2
5 0
2
5 0
2
5 0
2
5 0
2 1
2 1 2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
2
2
1
2
) w w . (
) w w . ( ) w w . (
) w w . (
e
e e
e
) ( d ) ( E
÷
÷ + ÷ ÷
+ ÷ ÷
+
=
)
`
¹
¹
´
¦
+
÷
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ ÷
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ =
The error at step 2 is given by:
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
For the third pattern x(3)=[3 1]
t
, d(3)=1.
 
2
3
2
3
3
2
3
2
3
2 1
2 1
2 1
2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
3
2
1
3
) w w (
) w w (
) w w (
) w w (
) w w (
e
e
e
e
e
) ( d ) ( E
+
+ ÷
+ ÷
+ ÷
+ ÷
+
=
)
`
¹
¹
´
¦
+
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ ÷
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ =
The error at step 3 is given by:
For the fourth pattern x(4)=[2 1]
t
, d(4)=1.
 
2
2
2
2
2
2
2
2
2 1
2 1 2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
4
2
1
4
) w w (
) w w ( ) w w (
) w w (
e
e e
e
) ( d ) ( E
÷
+ ÷ ÷ + ÷ ÷
+ ÷ ÷
+
=
)
`
¹
¹
´
¦
+
÷
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ ÷
=
)
`
¹
¹
´
¦
(
¸
(
¸
÷
+
÷ =
The error at step 4 is given by:
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
close all;
clear all;
[w1,w2] = meshgrid(4:.1:4, 4:.1:4);
Z1 = exp(w1+w2);
E1=2./(1+Z1).^2
mesh(Z1)
Z2 = exp(.5*w1w2);
E2=2./(1+Z2).^2
figure,mesh(Z2)
Z3 = exp(3*w1+w2);
E3=2./(1+Z3).^2
figure,mesh(Z3)
Z4 = exp(2*w1w2);
E4=2./(1+Z4).^2
figure,mesh(Z4)
subplot(2,2,1),mesh(E1);title('Error surface for xt(1)=[1 1] and y=f(wt*x(1))');
xlabel('w1'),ylabel('w2'),zlabel('E1(w1,w2)');
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
subplot(2,2,2),mesh(E2);title('Error surface for xt(2)=[.5 1] and y=f(wt*x(2))');
xlabel('w1'),ylabel('w2'),zlabel('E2(w1,w2)');
subplot(2,2,3),mesh(E3);title('Error surface for xt(3)=[3 1] and y=f(wt*x(3))');
xlabel('w1'),ylabel('w2'),zlabel('E3(w1,w2)');
subplot(2,2,4),mesh(E4);title('Error surface for xt(4)=[2 1] and y=f(wt*x(4))');
xlabel('w1'),ylabel('w2'),zlabel('E4(w1,w2)');
E = E1+E2+E3+E4;
figure,mesh(E);title('Total Error
E(w1,w2)=E1(w1,w2)+E2(w1,w2)+E3(w1,w2)+E4(w1,w2),MESH');
xlabel('w1'),ylabel('w2'),zlabel('E(w1,w2)');
figure,imshow(E,[],'notruesize');colormap(jet);title('Total Error
E(w1,w2)=E1(w1,w2)+E2(w1,w2)+E3(w1,w2)+E4(w1,w2),IMSHOW');
xlabel('w1'),ylabel('w2')
The error surfaces for the above four cases are
shown in the next slide:
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
0
50
100
0
50
100
0
1
2
w2
Error surface for xt(1)=[1 1] and y=f(wt*x(1))
w1
E
1
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w1
Error surface for xt(2)=[.5 1] and y=f(wt*x(2))
w2
E
2
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w2
Error surface for xt(3)=[3 1] and y=f(wt*x(3))
w1
E
3
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w1
Error surface for xt(4)=[2 1] and y=f(wt*x(4))
w2
E
4
(
w
1
,
w
2
)
Training Rule for a Single
Training Rule for a Single


Layer
Layer
Continuous Perceptron:The Delta
Continuous Perceptron:The Delta
Training Rule
Training Rule
The total error is defined by:
) w , w ( E ) w , w ( E ) w , w ( E ) w , w ( E ) w , w ( E
2 1 4 2 1 3 2 1 2 2 1 1 2 1
+ + + =
The total error surface is shown in the next
slide.
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
A contour map of the total error is depicted below:
A contour map of the total error is depicted below:
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
The classifier training has been simulated for
The classifier training has been simulated for
ì
ì
= 0.5 for four arbitrarily chosen initial weight
= 0.5 for four arbitrarily chosen initial weight
vectors, including the one taken from
vectors, including the one taken from
x(1)
x(1)
= 1,
= 1,
x(3)
x(3)
=3,
=3,
d1
d1
=
=
d3
d3
= 1: class C1
= 1: class C1
x(2)
x(2)
=
=


0.5,
0.5,
x(4)
x(4)
=
=


2,
2,
d2
d2
=
=
d4
d4
=
=


1: class C2
1: class C2
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
The resulting trajectories of 150 simulated
The resulting trajectories of 150 simulated
training steps are shown in the following figure
training steps are shown in the following figure
(each tenth step is shown).
(each tenth step is shown).
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
In each case the weights converge during training
In each case the weights converge during training
toward the center of the solution region obtained
toward the center of the solution region obtained
for the discrete perceptron case given on the next
for the discrete perceptron case given on the next
slide, which coincides with the dark blue region in
slide, which coincides with the dark blue region in
the contour map of the total error is depicted before
the contour map of the total error is depicted before
and also shown on the next slide.
and also shown on the next slide.
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
Given are the p training pairs
Given are the p training pairs
{x
{x
1 1
,
,
d
d
1, 1,
x
x
2 2
,
,
d
d
2 2
,
,
……………
……………
.,
.,
x
x
p, p,
d
d
p p
}, where x
}, where x
i i
is (N
is (N
+
+
1)
1)
x
x
1
1
D
D
i i
is 1 x 1, i=1,2, ,P. In the following n denotes
is 1 x 1, i=1,2, ,P. In the following n denotes
the training step and p denotes the step counter
the training step and p denotes the step counter
within the training cycle.
within the training cycle.
Step 1:
Step 1:
q
q
>0,
>0,
ì
ì
=1 and Emax>0 chosen.
=1 and Emax>0 chosen.
Step2:
Step2:
Weights are initialized at w at random small
Weights are initialized at w at random small
values, w is (N
values, w is (N
+
+
1)
1)
x
x
1. Counters and error are
1. Counters and error are
initialized.
initialized.
1 ,1 and 0 k p E ÷ ÷ ÷
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
Step3:
Step3:
The training cycle begins here. Input is
The training cycle begins here. Input is
presented and output is computed:
presented and output is computed:
, , ( )
p p
x x d d y f ÷ ÷ =
t
w x
Step4:
Step4:
Weights are updated:
Weights are updated:
2
1
, ( )(1 )
2
p
x x w w d y y x q ÷ ÷ + ÷ ÷
Step5:
Step5:
Cycle error is computed.
Cycle error is computed.
2
1
( )
2
E d y ÷ ÷
Training Rule for a Single
Training Rule for a Single


Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
Perceptron:The Delta Training Rule
Step 6:
Step 6:
If p<P then
If p<P then 1, 1 p p n n ÷ + ÷ +
and go to Step 3, otherwise go to Step 7.
and go to Step 3, otherwise go to Step 7.
Step 7:
Step 7:
The training cycle is completed. For E<E
The training cycle is completed. For E<E
max max
terminate the training session. Output weights, k
terminate the training session. Output weights, k
and E. If E>E
and E. If E>E
max max
then enter the new training cycle
then enter the new training cycle
by going to Step 3.
by going to Step 3.
v
1
x
2
x
j
x
J
1
k
K
2
v
2
v
k
v
K
x
1
y
1
y
2
y
k
y
K
Neurons
jth
column
of nodes
kth column of nodes
w
11
w
12
w
jk
w
KJ
w
K1
w
K2
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
w
1j
w
1J
w
w
21 21


1
1
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
The above can be redrawn as:
The above can be redrawn as:
J J j j
x w ...... x w ......... x w x w v
1 1 2 12 1 11 1
+ + + + + =
J J j j
x w ...... x w ......... x w x w v
2 2 2 22 1 21 2
+ + + + + =
1 1 2 2
......... ......
l l l lj j lJ J
v w x w x w x w x = + + + + +
J KJ j Kj K K K
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
) v ( f y
1 1
=
) v ( f y
2 2
=
( )
l l
y f v =
) v ( f y
K K
=
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
(
(
(
(
(
(
¸
(
¸
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J KJ K K
J
J
K
x
.
.
x
x
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
v
.
.
v
v
2
1
2 1
1 22 21
1 12 11
2
1
Wx v =






.

\

(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J J K
v
.
.
v
v
Γ
) v ( f
.
.
) v ( f
) v ( f
y
.
.
y
y
2
1
2
1
2
1
Γ(v) y =
(
(
(
(
(
(
¸
(
¸
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
J K
v
.
.
v
v
(.) f . .
. . . . .
. . . . .
. . (.) f
. . (.) f
y
.
.
y
y
2
1
2
1
0 0
0 0
0 0
Γ[Wx] y =
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
The desired and actual output vectors at the n
The desired and actual output vectors at the n
th th
training step are given as:
training step are given as:
1
2
( )
( )
.
.
( )
K
d n
d n
d n
(
(
(
( =
(
(
(
¸ ¸
d
2
2
1
)) n ( y ) n ( d ( ) n ( E ÷ =
The error expression for a single perceptron was given as:
1
2
( )
( )
.
.
( )
K
y n
y n
y n
(
(
(
( =
(
(
(
¸ ¸
y
where
where
n represents the n
n represents the n


th step which corresponds
th step which corresponds
to a
to a
specific input pattern
specific input pattern
that produces the output error.
that produces the output error.
2
1
2
2
1
2
1
) n n )) n ( y ) n ( d ( ) n ( E
K
k
k k
y( ) d( ÷ = ÷ =
¿
=
which can be generalised to include all squared errors at
the outputs k=1,2,.....,K
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
The updated weight value from input j to
The updated weight value from input j to
neuron k at step n is given by:
neuron k at step n is given by:
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
) n ( w Δ ) n ( w ) n ( w
kj kj kj
+ = +1
According to the delta training rule for
According to the delta training rule for
continuous perceptron
continuous perceptron
¹
´
¦
)
`
¹
=
=
(
(
¸
(
c
c
÷ =
=
J ,..., , j
K ,..., , k
for
w
E
) n ( w Δ
) n ( w w
kj
kj kj
kj
2 1
2 1
q
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
kj kj
w
v
v
E
w
E
k
k
c
c
c
c
=
c
c
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
Using
Using
we have
we have
j
k
x
w
v
kj
=
c
c
where
where
¬ = )) ( ( w v E E
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
The error signal term produced by the k
The error signal term produced by the k
th th
neuron is defined as:
neuron is defined as:
k
yk
v
E
c
c
÷ = o
j yk
x
w
E
kj
o ÷ =
c
c
Using this yields
Using this yields
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
k
k
k k
yk
v
y
y
E
v
E
c
c
c
c
÷ =
c
c
÷ = o
On the other hand we can write:
On the other hand we can write:
Since
Since
2
1
2
2
1
2
1
) n n )) n ( y ) n ( d ( ) n ( E
K
k
k k
y( ) d( ÷ = ÷ =
¿
=
) y d (
y
E
k k
k
÷ ÷ =
c
c
we get
we get
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
On the other hand using
On the other hand using
k
k
k
k
v
) v ( f
v
y
c
c
=
c
c
k
k
k k
k
k
k k
yk
v
) v ( f
) y d (
v
y
y
E
v
E
c
c
÷ =
c
c
c
c
÷ =
c
c
÷ = o
yields
yields
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
which is used to obtain
which is used to obtain
j
k
k
k k kj
x
v
) v ( f
) y d ( ) n ( w Δ
c
c
÷ =q
  ( )
2 2
1
2
1
1
2
1
k k
k
k
y )) v ( f (
v
) v ( f
÷ = ÷ =
c
c
For bipolar continuous activation function
For bipolar continuous activation function
we already know that
we already know that
Hence
Hence
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
j k k k kj
x ) )) n ( y ( ))( n ( y ) n ( d ( ) n ( w Δ
2
1
2
1
÷ ÷ = q
) n ( x ) )) n ( y ( ))( n ( y ) n ( d ( ) n ( w ) n ( w
j k k k kj kj
2
1
2
1
1 ÷ ÷ + = + q
and
and
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
) )) n ( y ( ))( n ( y ) n ( d ( ) n (
k k k yk
2
1÷ ÷ = o
where
where
 ) n ( x . ) n ( x
) n (
.
) n (
) n ( w . ) n ( w
. . .
) n ( w . ) n ( w
) n ( w . ) n ( w
. . .
) n ( w . ) n ( w
J
yK
y
KJ K
J
KJ K
J
1
1
1
1 11
1
1 11
1 1
1 1
(
(
(
¸
(
¸
+
(
(
(
¸
(
¸
(
(
(
¸
(
¸
=
(
(
(
¸
(
¸
+ +
+ +
o
o
q
we can write
we can write
Delta Training Rule for
Delta Training Rule for
Multi
Multi


Perceptron Layer
Perceptron Layer
1
( )
( ) .
( )
J
x n
n
x n
(
(
=
(
(
¸ ¸
x
( 1) ( )
t
y
n n q + = + W W δ x
Now defining
Now defining
1
( )
( ) .
( )
J
n
n
n
o
o
o
(
(
=
(
(
¸ ¸
v
1
x
2
x
j
x
J
1
k
K
2
v
2
v
k
v
K
x
1
y
1
y
2
y
k
y
K
jth column of nodes (hidden layer) kth column of nodes
w
11
w
k1
w
kj
w
kJ
w
K1
w
K2
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
z
1
z
2
z
i
.
.
.
.
.
.
.
.
.
.
z
I
ith column of nodes
t
t
11 11
t
t
21 21
t
t
j1 j1
t
t
IJ IJ
u
u
1 1
u
u
2 2
u
u
j j
u
u
J J
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
The weight adjustment for the hidden layer
The weight adjustment for the hidden layer
according to the gradient descent method will be:
according to the gradient descent method will be:
)
`
¹
¹
´
¦
=
=
(
(
¸
(
c
c
÷ =
=
I ,..., , i
J ,..., , j
for
t
E
) n ( t Δ
) n ( t t
ji
ji ji
ji
2 1
2 1
q
where
where
ji
j
j
t
u
u
E
t
E
ji
c
c
c
c
=
c
c
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
Here
Here
J ,...., , j for
u
E
j
xj
2 1 =
c
c
÷ = o
is the error signal term of the hidden layer
is the error signal term of the hidden layer
with output x.
with output x.
This term is produced by the j
This term is produced by the j


th
th
neuron of the hidden layer, where j=1,2,....,J.
neuron of the hidden layer, where j=1,2,....,J.
On the other hand, using
On the other hand, using
I jI j j j
z t ..... .......... z t z t u + + + =
2 2 1 1
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
ji
j
t
u
c
c
we can calculate
we can calculate
j
i
ji
u
z
t
c
=
c
as
as
i xj
ji
j
j
z
t
u
u
E
t
E
ji
o ÷ =
c
c
c
c
=
c
c
Therefore
Therefore
and
and
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
i xj ji
z t Δ qo =
Since
Since
) u ( f x
j j
=
j
j
j j
xj
u
x
x
E
u
E
c
c
c
c
÷ =
c
c
÷ = o
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
} {

.

\

÷
c
c
=
c
c
¿
=
K
k
k k
j j
) v ( f d
x x
E
1
2
2
1
and
and
j
j
j
j
u
) u ( f
u
x
c
c
=
c
c
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
 
j
k
k
k
K
k
k k
j
k
K
k
k k
j
x
v
v
) v ( f
) y d (
x
) v ( f
) v ( f d (
x
E
c
c
c
c
÷ ÷ =
c
c
÷ ÷ =
c
c
¿
¿
=
=
1
1
Now using
Now using
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
kj
j
k
w
x
v
=
c
c
we have
we have
k
k
k k
k
yk
v
) v ( f
) y d (
v
E
c
c
÷ =
c
c
÷ = o
Now using this equality and
Now using this equality and
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
 
j
k
k
k
K
k
k k
j
k
K
k
k k
j
x
v
v
) v ( f
) y d (
x
) v ( f
) v ( f d (
x
E
c
c
c
c
÷ ÷ =
c
c
÷ ÷ =
c
c
¿
¿
=
=
1
1
in
in
we obtain
we obtain
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
kj
K
k
yk
j
w
x
E
¿
=
÷ =
c
c
1
o
j
j
j j
xj
u
x
x
E
u
E
c
c
c
c
÷ =
c
c
÷ = o
Now using this and
Now using this and
kj
j
k
w
x
v
=
c
c
in
in
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
¿
=
c
c
=
K
k
kj yk i
j
j
ji
w z
u
) u ( f
t Δ
1
o q
¿
=
c
c
=
K
k
kj yk
j
j
xj
w
u
) u ( f
1
o o
we obtain
we obtain
Now using
Now using
i xj ji
z t Δ qo =
we get
we get
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
I ,..., , i
J ,..., , j for
z
u
) u ( f
w ) n ( t ) n ( t
i
j
j
K
k
kj yk ji ji
2 1
2 1
1
1
=
=
c
c

.

\

+ = +
¿
=
o q
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
Now defining the j
Now defining the j


th column of the matrix
th column of the matrix
(
(
(
(
(
(
¸
(
¸
=
KJ K K
J
J
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
2 1
1 22 21
1 12 11
W
as
as
j
w
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
y
K
k
kj yk
w δ w
t
j
=
¿
=1
o
(
(
(
¸
(
¸
=
yK
y
.
o
o
1
y
δ
and using
and using
we can write
we can write
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
) x ( f
u
) u ( f
j
'
xj
j
j
2
1
2
1
÷ = =
c
c
In the case of bipolar activation function we
In the case of bipolar activation function we
obtain for the hidden layer
obtain for the hidden layer
Now construct a vector whose entries are the
Now construct a vector whose entries are the
above terms for j=1,2,...,J, i.e.,
above terms for j=1,2,...,J, i.e.,
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
(
(
(
(
(
(
(
(
¸
(
¸
÷
÷
÷
=
(
(
(
(
(
(
¸
(
¸
=
) x (
) x (
) x (
f
f
f
f
J
'
xJ
'
x
'
x
'
x
2
2
2
2
1
2
1
1
2
1
1
2
1
1
2
1
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
z f δ w
'
x y
t
j i
'
j
K
k
kj yk
) ( z f w = 
.

\

¿
=1
o
We then have
We then have
(
(
(
(
(
(
¸
(
¸
=
I
z
.
.
z
z
2
1
z
and define
and define
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
Now defining
Now defining
(
(
(
(
(
(
¸
(
¸
=
JI J J
I
I
t . . t t
. . . . .
. . . . .
t . . t t
t . . t t
2 1
1 22 21
1 12 11
T
and
and
'
x y
t
j x
f δ w δ =
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
we finally obtain
we finally obtain
t
x
) n ( ) n ( z δ T T q + = +1
This updating formula is called the
This updating formula is called the
Generalised
Generalised
Delta Rule for adjusting the hidden layer weights.
Delta Rule for adjusting the hidden layer weights.
A similar formula was given for updating the
A similar formula was given for updating the
output layer weights:
output layer weights:
( 1) ( )
t
y
n n q + = + W W δ x
k
k
k k yk
v
) v ( f
) y d (
c
c
÷ = o
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
Here the main difference is in computing the error
Here the main difference is in computing the error
signals
signals
o
o
y y
and
and
o
o
x x
. In fact, the entries of
. In fact, the entries of
o
o
y y
are given
are given
as
as
which only contain terms belonging to the output
which only contain terms belonging to the output
layer. However, this is not the case with
layer. However, this is not the case with
o
o
x x
,
,
'
x y
t
j x
f δ w δ =
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
whose entries are weighted sum of error
whose entries are weighted sum of error
signals
signals
o
o
yk yk
produced by the following layer.
produced by the following layer.
Here we can draw the following conclusion:
Here we can draw the following conclusion:
'
x
f
The Generalised Delta Learning Rule propagates
The Generalised Delta Learning Rule propagates
the error back by one layer which is true for every
the error back by one layer which is true for every
layer.
layer.
Generalised Delta Training Rule for
Generalised Delta Training Rule for
Multi
Multi


Layer Perceptron
Layer Perceptron
 Summary of the Error BackPropagation
Training Algorithm (EBPTA) Given are P
training pairs
where zi is (1 Xl), d
i
is (K X 1), and i = 1, 2, ... ,
P. Note that the l'th component of each zi is of
value 1 since input vectors have been
augmented. Size J  1 of the hidden layer
having outputs y is
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
selected. Note that the J'th component of y is of
value 1, since hidden layer outputs have also
been augmented; y is (J X 1) and 0 is (K Xl).
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 1: 'T/ > 0, Emax chosen .
Weights W and V are initialized at small random
values; W is (K X J), V is (J X /).
q f 1, p f 1, E f 0
. Step 2: Training step starts here
(See Note 1 at end of list.)
Input is presented and the layers' outputs computed
[f(net) as in (2.3a) is used]:
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
Z f zP' d f d
p
Yj f f(vjz), for j = 1, 2, ... , J where
~
'" a column
vector, is the j'th row of V, and J .
ok f f(w
~
y), . for k = 1, 2, ... , K
where Wk, a column vector, is the k'th row of W.
Step 3: Error value is computed:
1 2
E f '2(d
k
 Ok) + E, for k = 1, 2, ... , K
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 4: Error signal vectors 8
0
and 8y of
both layers are computed.
Vector 8
0
is (K Xl), 8y is (J X 1). (See
Note 2 at end of list.) The error signal
terms of the output layer in this step are
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
1 2
Ook = (d
k
 0k)(l  Ok)' for k = 1, 2, ... , K 2
The error signal terms of the hidden layer in this step
are
12K
0yj = _ (1  Yj) L 0okWkj, for j = 1, 2, ... , J
k=l
Step 5: Output layer weights are adjusted:
Wkj f Wkj + T/OokYj, for k = 1, 2, ... , K and j = 1, 2,
... ,J
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 7: If p < P then p f P + 1, q f q + 1, and
go to Step 2; otherwise, go to Step 8.
Step 6: Hidden layer weights are adjusted:
Vji f Vji + T/OyPi' for j = 1, 2, ... ,J and i = 1,
2, ... ,I
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 8: The training cycle is completed.
For E < Emax terminate the training session. Output
weights W, V, q, and E.
If E > Emax' then E f 0, P f 1, and initiate the new
training cycle by going to Step 2.
NOTE 1 For best results, patterns should be
chosen at random
from the training set Qustification follows in Section
4.5).
Error Back
Error Back


Propagation Training
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
11IIIII NOTE 2 If formula (2.4a) is used in Step
2, then the error
signal terms in Step 4 are computed as follows
80k = (dk  0k)(l  0k)ok' for k = 1, 2, ... , K K
8y) = y}(l  y) L 8okWk}' for j = 1, 2, ... , J k=l
The
The
Hopfield Network
Hopfield Network
We know that the
We know that the
Hopfield Network
Hopfield Network
is a
is a
Recurrent (Feedback or Dynamical) Neural
Recurrent (Feedback or Dynamical) Neural
Network.
Network.
Let
Let
y
y
i i
, i=1,2,.....,n, be the
, i=1,2,.....,n, be the
outputs
outputs
of the
of the
network
network
and the energy function E satisfy the
and the energy function E satisfy the
following:
following:
0
2
<

.

\

÷ =
¿
=
n
1 i
i
d
dy
d
d
t t
E
i
o
where
where
o
o
i i
>0 , i=1,2,.....,n
>0 , i=1,2,.....,n
.
.
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The above inequlity reveals that the energy
The above inequlity reveals that the energy
decreases with time and becomes zero if and
decreases with time and becomes zero if and
only if
only if
, i
dt
dy
i
¬ = 0
i.e.,
i.e.,
, i t tan cons ) t ( y
i
¬ =
i.e.,
i.e.,
. states m equilibriu stable their reach ) t ( y
i
The
The
Hopfield Network
Hopfield Network
Now let us assume that
Now let us assume that
1
( )
i
i i
i
df y
C
dy
o
÷
=
where
where
0 >
i
C
The
The
Hopfield Network
Hopfield Network
ax
ax
e
e
) x ( f y
÷
÷
+
÷
= =
1
1
For the bipolar activation function
For the bipolar activation function
the inverse function is given by:
the inverse function is given by:
y
y
ln
a
) y ( f x
+
÷
÷ = =
÷
1
1 1
1
The
The
Hopfield Network
Hopfield Network
The Bipolar Activation Function and its Inverse
The Bipolar Activation Function and its Inverse
The
The
Hopfield Network
Hopfield Network
The Derivative of the Inverse of the Bipolar Function
The Derivative of the Inverse of the Bipolar Function
2
1
1 2
y a dy
dx
÷
=
The
The
Hopfield Network
Hopfield Network
0
2
<

.

\

÷ =
¿
=
i
i i
i
y
) y ( f
t
y
C
t
E
d
d
d
d
d
d
1
n
1 i
Therefore
Therefore
We can conclude that
We can conclude that
1 1 0
1
< < ÷ >
÷
i
i
i
y for
dy
) y ( df
The
The
Hopfield Network
Hopfield Network
t
y
y
) y ( f
t
) y ( f
t
x
i
i
i i i
d
d
d
d
d
d
d
d
1 1
= =
Considering
Considering
we obtain
we obtain
2
1
n
i 1 1
d d ( ) d
d d d
n
i i i i
i i
i
i
y f y dx dy E
C C
t t y dt dt
= =
 
= ÷ = ÷

\ .
¿ ¿
The
The
Hopfield Network
Hopfield Network
dt
d
dt
d
dt
d
dt
d
dt
dE
t
t
y x
C
x
C
y

.

\

÷ = ÷ =
(
(
(
(
(
(
¸
(
¸
=
N
x
.
.
x
x
2
1
x
(
(
(
(
(
(
¸
(
¸
=
N
y
.
.
y
y
2
1
y
) C ( diag
i
= C
Now defining
Now defining
yields
yields
The
The
Hopfield Network
Hopfield Network
Since
Since
dt
d
) ( E
dt
dE
t
y
y V =
We can write
We can write
dt
d
C ) ( E
x
y = V ÷
This reveals that the capacitor current vector
This reveals that the capacitor current vector
is parallel to the negative gradient vector.
is parallel to the negative gradient vector.
The
The
Hopfield Network
Hopfield Network
.
.
.
.
.
.
I
I
i i
w
w
i1 i1
w
w
i2 i2
w
w
iN iN
y
y
1 1
y
y
2 2
y
y
N N
y
y
i i
C
C
i i
g
g
i i
i i
N
j
N
j
i ij j ij
i
i
I x g w y w
t
x
C +



.

\

+ ÷ =
¿ ¿
= = 1 1
d
d
¿
=
+ ÷ ÷ =
N
j
i i i i j ij
i
i
I x g x y w
t
x
C
1
) (
d
d
x
x
i i
) y ( f x
) x ( f y
i i
i i
1 ÷
=
¬ =
The
The
Hopfield Network
Hopfield Network
i
N
j
i ij
G g w = +
¿
=1
) C ( diag
i
= C
N ,..., , i ), G ( diag
i
2 1 = = G
Now define
Now define
(
(
(
(
(
(
¸
(
¸
=
NN N N
N
N
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
2 1
2 22 12
1 12 11
W
(
(
(
(
(
(
¸
(
¸
=
N
I
.
.
I
I
2
1
I
(
(
(
(
(
(
¸
(
¸
=
N
x
.
.
x
x
2
1
x
The
The
Hopfield Network
Hopfield Network
I Gx Wy
) x(
C + ÷ =
t d
d t
We obtain
We obtain
i i
N
j
i j ij
i
i
I x G y w
t
x
C + ÷ =
¿
=1
d
d
consequently
consequently
dt
d
) ( E
x
C y = V ÷
and since
and since
The
The
Hopfield Network
Hopfield Network
I Gx Wy y + ÷ = V ÷ ) ( E
we obtain
we obtain
In the case of bipolar activation function we know that
In the case of bipolar activation function we know that
y
y
ln
a
) y ( f x
+
÷
÷ = =
÷
1
1 1
1
The
The
Hopfield Network
Hopfield Network
(
(
(
(
(
(
(
(
(
¸
(
¸
+
÷
+
÷
+
÷
÷ =
N
N
y
y
ln
y
y
ln
y
y
ln
a
1
1
1
1
1
1
1
2
2
1
1
x
Therefore the state vector is given as:
Therefore the state vector is given as:
The
The
Hopfield Network
Hopfield Network
dt
dy
dt
dx
C
t
E
i
N
i
i
i ¿
=
÷ =
1
d
d
We already know
We already know
therefore
therefore
) (
d
d
1 1 1 1
dt
dy
I
dt
dy
x G
dt
dy
y w
t
E
i
N
i
N
j
N
i
N
i
i
i
i i
i
j ij ¿¿ ¿ ¿
= = = =
+ ÷ ÷ =
¿ ¿
= =
+ ÷ ÷ =
N
i
N
j
i
i i i j ij
dt
dy
I x G y w
t
E
1 1
) (
d
d
and
and
The
The
Hopfield Network
Hopfield Network
dt
d
dt
d
) (
dt
d y
W y Wy
y
Wy y
t
t
t
+ =
W W
t
=
then
then
If
If
dt
d
dt
d
) (
dt
d y
W y y W
y
Wy y
t t
t
t
+ =
Now consider:
Now consider:
The
The
Hopfield Network
Hopfield Network
dt
d
) (
dt
d y
W y Wy y
t t
2 =
dt
d
)
dt
d
(
dt
d
dt
d
t
t t
y
W y
y
W y Wy
y
y W
y
t t t
= = =
Therefore
Therefore
) (
dt
d
dt
d
Wy y
y
W y
t t
2
1
=
and
and
) (
d
d
1 1 1 1
dt
dy
I
dt
dy
x G
dt
dy
y w
t
E
i
N
i
N
j
N
i
N
i
i
i
i i
i
j ij ¿¿ ¿ ¿
= = = =
+ ÷ ÷ =
The
The
Hopfield Network
Hopfield Network
Now consider the first term of
Now consider the first term of
dt
d
dt
dy
y w
N
i
N
j
i
j ij
y
W y
t
=
¿¿
= = 1 1
We can write:
We can write:
Now using the above equality, we have
Now using the above equality, we have
) (
dt
dy
y w
N
i
N
j
i
j ij
Wy y
dt
d
t
2
1
1 1
=
¿¿
= =
The
The
Hopfield Network
Hopfield Network
Now consider the second term in the same equation:
Now consider the second term in the same equation:
dt
dy
) y ( f
dt
dy
x
i
i
i
i
1 ÷
=
1 1
0
( ) ( ( ) )
i
y
i
i
d
f y f y dy
dy
÷ ÷
=
}
we can write
we can write
1 1 1
0 0
( ) ( ( ) ) ( ( ) )
i i
y y
i i
i
i
dy d dy d
f y f y dy f y dy
dt dy dt dt
÷ ÷ ÷
= =
} }
The
The
Hopfield Network
Hopfield Network
1
1 1
0
1
( ( ) )
2
i
y
N N
t
i i i i
i i
dE d
y Wy G f y dy I y
dt dt
÷
= =
= ÷ ÷ +
¿ ¿
}
1
1 1
0
1
( ( ) )
2
i
y
N N
t
i i i i
i i
E y Wy G f y dy I y
÷
= =
= ÷ ÷ +
¿ ¿
}
The
The
Hopfield Network
Hopfield Network
2
2 1
1
i
i i
dx
dy a y
=
÷
i i
N
j
i j ij
i
i
I x G y w
t
x
C + ÷ =
¿
=1
d
d
In order to obtain the state equations in terms
In order to obtain the state equations in terms
of the outputs y
of the outputs y
i i
consider once again
consider once again
Using
Using
The
The
Hopfield Network
Hopfield Network
we obtain
we obtain
and
and
i i
N
j
i j ij
i
i
I x G y w
t
y
y a
C
i
+ ÷ =
÷
¿
=1
2
d
d
) 1 (
2
i i
N
j
i j ij
i
i
I x G y w
C
y a
t
y
i
+ ÷
÷
=
¿
=1
2
2
) 1 (
d
d
The
The
Hopfield Network
Hopfield Network
( ) I (y) GΓ Wy
y
1
+ ÷


.

\

÷
=
÷
i
i
C
) y ( a
diag
dt
d
2
1
2
The
The
Hopfield Network
Hopfield Network
y
y
1 1
C
C
1 1
g
g
1 1
x
x
1 1
y
y
2 2
C
C
2 2
g
g
2 2
x
x
2 2
1
1 1 1 11 1 1 12 2 1
( )+ ( )
dx
g x C g y x g y x
dt
+ = ÷ ÷
2
2 2 2 22 2 2 21 1 2
( )+ ( )
dx
g x C g y x g y x
dt
+ = ÷ ÷
g
g
12 12
g
g
21 21
g
g
11 11
g
g
22 22
The
The
Hopfield Network
Hopfield Network
1
1 11 12 1 1 11 12 1
1 21 22 2 2 22 21 2 2
0 0
0 0
dx
C g g y g g g x
dt
C g g y g g g x dx
dt
(
(
+ +
( ( ( ( (
= ÷
(
( ( ( ( (
+ +
¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ (
(
¸ ¸
1 11 12 1 11 12
1 21 22 2 22 21
0 0
, ,
0 0
C g g g g g
C g g g g g
+ +
( ( (
= = =
( ( (
+ +
¸ ¸ ¸ ¸ ¸ ¸
C W G
which
which
yields
yields
The
The
Hopfield Network
Hopfield Network
( )
2
11 12 1
1
1
2
1
21 22 2
0
1
2
i
y
i
i
g g y
E y y G f y dy
g g y
÷
=
( (
= ÷ +
(
( ( ¸ ¸
¸ ¸ ¸ ¸
¿
}
and
and
1
1 11 12 1 1
1 2
21 22 2 2 2
2
1
ln
1 0
1
( , )
0 1
ln
1
y
y g g y G
E y y
g g y G y a
y
÷
(
(
+
( ( (
(
÷V = ÷
( ( (
÷ (
¸ ¸ ¸ ¸ ¸ ¸
(
+
¸ ¸
The
The
Hopfield Network
Hopfield Network
and
and
1
11 1 12 2 1
1 1
1 2
2
22 1 21 1 2
2 2
1 1
ln
1
( , )
1 1
ln
1
E y
g y g y G
y a y
E y y
E y
g y g y G
y a y
c ÷
( (
÷ ÷ ÷ +
( (
c +
( (
÷V = =
c ÷ ( (
÷ ÷ ÷ +
( (
c +
¸ ¸ ¸ ¸
The
The
Hopfield Network
Hopfield Network
( ) ( )
1 2
2 2 1 1
11 1 22 2 1 2 12 1 2 12 1 2
0 0
1
( )
2
y y
E g y g y y y g y y g G f y dy G f y dy
÷ ÷
= ÷ + + + + +
} }
1 2
2 2
11 1 22 2 1 2 21 1 2 12 1 2
0 0
1 1 1 1
( ) ln ln
2 1 1
y y
y y
E g y g y y y g y y g G dy G dy
a y y
÷ ÷
= ÷ + + + ÷ ÷
+ +
} }
} } }
+ ÷ ÷ =
+
÷
=
i i i
y y y
dy ) y ln( dy ) y ln( dy
y
y
ln I
0 0 0
1 1
1
1
Now consider
Now consider
The
The
Hopfield Network
Hopfield Network
dy ) y ln( I
i
y
}
÷ =
0
1
1
and
and
Let
Let
v y
u ) y ln(
= ÷
= ÷
1
1
then
then
dy dv
y
dy
du
÷ =
÷
÷
=
1
}
} } }
÷
÷
÷ + ÷ ÷ ÷ =
+ ÷ = ÷ = ÷ =
i
i
i
i
i i
y
y
y
y
y y
y
dy
) y ( ) y ( ) y ( ln
) vdu ] uv ( dv u dy ) y ln( I
0
0
0
0 0
1
1
1 1
1
0
] } 1 {
Hence
Hence
The
The
Hopfield Network
Hopfield Network
i i i
y ) y ( ) y ( ln I ÷ ÷ ÷ ÷ = 1
1
} 1 {
i i i
y ) y ( ) y ( ln I ÷ + + = 1
2
} 1 {
} } }
+ ÷ ÷ =
+
÷
=
i i i
y y y
dy ) y ln( dy ) y ln( dy
y
y
ln I
0 0 0
1 1
1
1
) y ( ) y ( ln ) y ( ) y ( ln I I I
i i i i
+ + ÷ ÷ ÷ ÷ = ÷ = 1 1
2 1
} 1 { } 1 {
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
1 2
2 2
11 1 22 2 1 2 12 1 2 12 1 2
0 0
1 1 1 1 1
( ) ln ln
2 1 1
y y
y y
E g y g y y y g y y g G dy G dy
a y a y
÷ ÷
= ÷ + + + ÷ ÷
+ +
} }
{ }
{ }
2 2
11 1 22 2 12 21 1 2 1 1 1 1 1
2 2 2 2 2
1 1
{ } 1 ln 1 1 ln 1
2
1
1 ln 1 1 ln 1
E g y g y (g g )y y G ( y ) ( y ) ( y ) ( y )
a
G ( y ) ( y ) ( y ) ( y )
a
= ÷ + + + + ÷ ÷ + + +
+ ÷ ÷ + + +
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
2
1 1
1
1 1 11 12 1 1
2
21 22 2 2 2 2
2
2
2
1 1 1
0 ln
2 1 0
0 1 1
1
ln
0
1
2
y y
dy
a
C a y g g y G
dt
g g y G dy y
y
a
dt a y
C
(
÷ ¦ ÷ ¹
(
(
÷
(
¦ ¦
(
(
+
( ( (
¦ ¦
(
(
= ÷
(
´ `
( ( (
(
÷ (
÷
¸ ¸ ¸ ¸ ¸ ¸ (
¦ ¦
÷
(
(
(
¦ ¦
+ ¸ ¸
¸ ¸
¹ )
¸ ¸
2
1
1
1 11 1 12 2 1
1 1
2
2 2
2 22 2 21 1 2
2 2
1 1
(1 )( ln )
2 1
1 1
(1 )( ln )
2 1
y
dy
y ag y ag y G
C y
dt
dy y
y ag y ag y G
dt C y
÷
(
(
÷ + +
(
(
+
(
=
(
÷ (
(
÷ + +
(
(
+ ¸ ¸
¸ ¸
Using
Using
( ) I (y) GΓ Wy
y
1
+ ÷


.

\

÷
=
÷
i
i
C
) y ( a
diag
dt
d
2
1
2
the state equations are obtained as
the state equations are obtained as
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
I Gx Wy
) x(
C + ÷ =
t d
d t
Consider the state equation of the Gradient
Consider the state equation of the Gradient


Type
Type
Hopfield Network:
Hopfield Network:
I (y) GΓ Wy
) x(
C
1 
+ ÷ =
t d
d t
We can write
We can write
As the plot of the inverse
As the plot of the inverse
bipolar activation function
bipolar activation function
shows the second term in
shows the second term in
the above equation is zero
the above equation is zero
for high gain neurons.
for high gain neurons.
Hence:
Hence:
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
I Wy
) x(
C + =
t d
d t
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
Now consider
Now consider
Using the this plot we
Using the this plot we
can conclude that
can conclude that
t
y
y
) y ( f
t
) y ( f
t
x
i
i
i i i
d
d
d
d
d
d
d
d
1 1
= =
1
d ( )
0
d
i
i
f y
y
=
for high gain neurons.
for high gain neurons.
Now let us solve this equation using Jacobi
Now let us solve this equation using Jacobi
’
’
s
s
algorithm. To this end define:
algorithm. To this end define:
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
I Wy 0 + =
0
) x(
=
t d
d t
Hence
Hence
It follows that
It follows that
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
D = ÷ + W' W = L U
and
ii
w L,U D = diag( )
where
where
11 12 1N 11 12 1N
21 21 22 2N 22 2N
31 32
N1 N2 N,N 1 N1 N2 NN NN
0 0 . . 0 w w . . w w 0 . . 0 0 w . . w
w 0 . . 0 w w . . w 0 w . . 0 0 0 . . w
w w 0 . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
w w . w 0 w w . . w 0 0 . . w
÷
÷ ÷ ( ( (
( ( (
÷ ÷
( ( (
( ( ( ÷ ÷ = = ÷ ÷
( ( (
( ( (
( ( (
÷ ÷ ÷
¸ ¸ ¸ ¸ ¸ ¸
W
N 1,N
. w
0 0 . . 0
÷
(
(
(
(
(
÷
(
(
¸ ¸
Are the lower and upper triangular and diagonal
Are the lower and upper triangular and diagonal
matrices shown in the following decomposition
matrices shown in the following decomposition
of
of
W.
W.
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
ii
D w = diag( )
Now defining
Now defining
we obtain
we obtain
I
1 1
Dy =W'y + I
y = D W'y  D
I =
1
1
D W' =
D
W
I
Now define
Now define
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
+ y = y W I
Now replace the vector
Now replace the vector
y
y
on the right
on the right


hand side
hand side
by an initial
by an initial
y(0)
y(0)
vector
vector
.
.
If the vector
If the vector
y
y
on the left
on the left


hand side is obtained as
hand side is obtained as
y(0),
y(0),
then
then
y(0)
y(0)
is the
is the
solution of the system. If not then call the vector
solution of the system. If not then call the vector
y
y
obtained on the left
obtained on the left


hand side
hand side
y(1), i.e.,
y(1), i.e.,
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
and in general we can write
+ y(k + 1) = y(k) W I
+ y(1) = y(0) W I
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
The method will always converge if the matrix
The method will always converge if the matrix
W
W
is strictly or irreducibly
is strictly or irreducibly
diagonally
diagonally
dominant
dominant
. Strict row diagonal dominance
. Strict row diagonal dominance
means that for each row, the absolute value of
means that for each row, the absolute value of
the diagonal term is greater than the sum of
the diagonal term is greater than the sum of
absolute values of other terms:
absolute values of other terms:
ii ij
i j
w w
=
>
¿
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
The Jacobi method sometimes converges even
The Jacobi method sometimes converges even
if this condition is not satisfied. It is necessary,
if this condition is not satisfied. It is necessary,
however, that the diagonal terms in the matrix
however, that the diagonal terms in the matrix
are greater (in magnitude) than the other
are greater (in magnitude) than the other
terms.
terms.
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
Solution by GaussSeidel Method
In Jacobi’s method the updating of the
unknowns is made after all N unknowns have
been moved to the left side of the equation. We
will see in the following that this is not
necessary, i.e., the updating can be made
individually for each unknown and this updated
value can be used in the next equation. This is
shown in the following equations:
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
1 12 2 13 3 1 1
11
1
( 1) [ ( ) ( ) ...... ( ) ]
N N
x n a x n a x n a x n b
a
+ = ÷ ÷ + + +
 
2 21 1 23 3 2 2 22
22
1
( 1) ( 1) ( ) ..... ( )
N N
x n a x n a x n a x n b a
a
+ = ÷ + ÷ + + +
 
3 31 1 32 2 34 2 3 3 33
33
1
( 1) ( 1) ( 1) ( ) ...... ( )
N N
x n a x n a x n a x n a x n b a
a
+ = ÷ + ÷ + ÷ ÷ ÷ +
1 1 2 2 , 1 1
1
( 1) ( 1) ( 1) ...... ( 1)
N N N N N N N NN
NN
x n a x n a x n a x n b a
a
÷ ÷
( + = ÷ + ÷ + ÷ ÷ + +
¸ ¸
In vectormatrix form, we can write:
1
( 1) ( 1) ( )
( ) ( 1) ( )
( 1) ( ) ( ( ) )
D n L n U n b
D L n U n b
n D L U n b
÷
+ = + + +
÷ + = +
+ = ÷ +
x x x
x x
x x
and
and
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
This matrix expression is mainly used to analyze
the method. When implementing Gauss Seidel,
an explicit entrybyentry approach is used:
1
( 1) ( 1) ( )
i i ij j ij j
j i j i
ii
x n b a x n a x n
a
< >
(
+ = ÷ + ÷
(
¸ ¸
¿ ¿
Discrete
Discrete


Time
Time
Hopfield Network
Hopfield Network
s
s
Gauss
Gauss


Seidel method
Seidel method
is defined on matrices with non
is defined on matrices with non


zero
zero
diagonals,
diagonals,
but convergence is only guaranteed if the matrix
but convergence is only guaranteed if the matrix
is
is
e
e
ither
ither
:
:
1. 1.
diagonally dominant
diagonally dominant
or
or
2. 2.
symmetric and
symmetric and
positive definite
positive definite
.
.
What Is a Neural Network?
Work on artificial neural networks, commonly referred to as "neural networks," has been motivated right from its inception by the recognition that the brain computes in an entirely different way from the conventional digital computer.
What Is a Neural Network?
The struggle to understand the brain owes much to the pioneering work of Ramon y Cajal (1911), who introduced the idea of neurons as structural
constituents of the brain.
Typically, neurons are five to six orders of magnitude slower than silicon logic gates; events in a silicon chip happen in the nanosecond (109 s) range, whereas neural events happen in the millisecond (103 s) range.
What Is a Neural Network?
However, the brain makes up for the relatively slow rate of operation of a neuron by having a truly staggering number of neurons (nerve cells) with massive interconnections between them.
What Is a Neural Network?
It is estimated that there must be on the order of 10 billion neurons in the human cortex, and 60 trillion synapses or connections (Shepherd and Koch, 1990). The net result is that the brain is an enormously efficient structure. Specifically, the energetic efficiency of the brain is approximately 1016 joules (J) per operation per second. The corresponding value for the best computers in use today is about 106 joules per operation per second (Faggin, 1991).
What Is a Neural Network?
The brain is a highly complex, nonlinear, and parallel computer (informationprocessing system). It has the capability of organizing neurons so as to perform certain computations (e.g., pattern recognition, perception, and motor control) many times faster than the fastest digital computer in existence today.
What Is a Neural Network?
Consider, for example, human vision, which is an informationprocessing task (Churchland and Sejnowski, 1992; Levine, 1985; Marr, 1982). It is the function of the visual system to provide a representation of the environment around us and, more important, to supply the information we need to interact with the environment.
What Is a Neural Network?
The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in something of the order of 100200 ms, whereas tasks of much lesser complexity will take hours on conventional computers.
What Is a Neural Network?
For another example, consider the sonar of a bat. Sonar is an active echolocation system. In addition to providing information about how far away a target (e.g., a flying insect) is, a bat sonar conveys information about the relative velocity of the target, the size of the target, the size of various features of the target, and the azimuth and elevation of the target (Suga, 1990a, b).
What Is a Neural Network? The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum. Indeed. an echolocating bat can pursue and capture its target with a facility and success rate that would be the envy of a radar or sonar engineer. .
does a human brain or the brain of a bat do it? At birth.What Is a Neural Network? How. then." . a brain has great structure and the ability to build up its own rules through what we usually refer to as "experience.
'· hardwiring) of the human brain taking place in the first two years from birth.What Is a Neural Network? Indeed. . about 1 million synapses are formed per second. but the development continues well beyond that stage. During this early stage of development. with the most dramatic development (i..e. experience is built up over the years.
which operates as follows: . The most common kind of synapse is a chemical synapse.What Is a Neural Network? Synapses are elementary structural and functional units that mediate the interactions between neurons.
1990). Thus a synapse converts a presynaptic electrical signal into a chemical signal and then back into a postsynaptic electrical signal (Shepherd and Koch. .What Is a Neural Network? A presynaptic process liberates a transmitter substance that diffuses across the synaptic junction between neurons and then acts on a postsynaptic process.
.What Is a Neural Network? In electrical terminology. In traditional descriptions of neural organization. it is assumed that a synapse is a simple connection that can impose excitation or inhibition. but not both on the receptive neuron. such an element is said to be a nonreciprocal twoport device.
molded. Eggermont. plasticity may be accounted for by two mechanisms: the creation of new synaptic connections between neurons.What Is a Neural Network? the developing nervous system to adapt to its surrounding environment (Churchland and Sejnowski. from plassein. from Greek . 1992.] )permits A developing neuron is synonymous with a plastic brain: Plasticity ([Latin plasticus. to mold. plastikos. and the modification of existing synapses. In an adult brain. from plastos. see pelə2 in IndoEuropean roots. 1990).
the receptive zones. and greater length. whereas a dendrite (so called because of its resemblance to a tree) has an irregular surface and more branches (Freeman. 1975). the transmission lines. constitute two types of cell filaments that are distinguished on morphological grounds. fewer branches. an axon has a smoother surface.What Is a Neural Network? Axons. . and dendrites.
Like many other types of neurons.000 or more synaptic contacts and it can project onto thousands of target cells.What Is a Neural Network? Neurons come in a wide variety of shapes and sizes in different parts of the brain. . it receives most of its inputs through dendritic spines. The pyramidal cell can receive 10. which is one of the most common types of cortical neurons. The figure illustrates the shape of a pyramidal cell.
.
so it is with neural networks made up of artificial neurons. .What Is a Neural Network? Just as plasticity appears to be essential to the functioning of neurons as informationprocessing units in the human brain.
a neural network is a machine that is designed to model the way in which the brain performs a particular task or function of interest.What Is a Neural Network? In its most general form. the network is usually implemented using electronic components or simulated in software on a digital computer. .
What Is a Neural Network? In most cases the interest is confined largely to an important class of neural networks that perform useful computations through a process of learning. .
neural networks employ a massive interconnection of simple computing cells referred to as "neurons" or "processing units." We may thus offer the following definition of a neural network viewed as an adaptive machine: .What Is a Neural Network? To achieve good performance.
What Is a Neural Network? A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1 Knowledge is acquired by the network through a learning process . 2 Interneuron connection strengths known as synaptic weights are used to store the knowledge. .
the function of which is to modify the synaptic weights of the network in an orderly fashion so as to attain a desired design objective. .What Is a Neural Network? The procedure used to perform the learning process is called a learning algorithm.
Widrow and Stearns. and biomedical engineering (Haykin. 1985). seismology. 1991. Such an approach is the closest to linear adaptive filter theory. . radar.What Is a Neural Network? The modification of synaptic weights provides the traditional method for the design of neural networks. which is already well established and successfully applied in such diverse fields as communications. sonar. control.
What Is a Neural Network? However. it is also possible for a neural network to modify its own topology. . which is motivated by the fact that neurons in the human brain can die and that new synaptic connections can grow.
connectionist networks. parallel distributed processors.What Is a Neural Network? Neural networks are also referred to in the literature as neurocomputers. . etc.
its ability to learn and therefore generalize.Benefits of Neural Networks From the above discussion. encountered during training (learning). generalization refers to the neural network producing reasonable outputs for inputs not 1. . it is apparent that a neural network derives its computing power through: its massively parallel distributed structure. 1.
to be unlike. down.prea. ferre.t.dif. separately). apart. ferre. to bestow (to place or put by). to put off to another time. to give. deferre.t. to delay defer: L. inferre.de. to carry v. differre..in. to bear v.t. to put forward. preaferre. differre. ferre.dis. ferre. to bring v. to drive as a conclusion prefer: L. to bear . v. to bring on.i. ferre. together. apart. to submit or to or to lay before somebody differ: L.t. asunder (adv.in front of. for acceptance or consideration.t.con. offer. to bring v. to set in front. present. to yield (to the wishes or opinions of another. ferre. or to authority). into. submit.How does the following example help you to generalize ? confer: L. to talk or consult together Benefits of Neural Networks defer: L. conferre.i.( for dis). to bear v. distinct or various infer: L. into parts. to bear v. to promote .
an agreement (Geneva Convention) invent L. convenire. to precede. con. to be. earlier than. to stop.t. invenire. in. preavenire.i.i. v. inventum.t.Benefits of Neural Networks convene L. go. to find. or hinder effectually. to keep from coming to pass .t. to preclude. to come v. to come v. to device or contrive prevent L. and venire. upon. to convene convention the act of convening. to come v. :an assembly. act. keep. to call together convent v. venire. venire. to come together. prea. of special delegates for some common object.in front of. esp.together.
"together. from Gk." from syn.Benefits of Neural Networks Synonym:1432 (but rare before 18c.)." noun use of neut. synonymous. synonymum. synonymon "word having the same sense as another. same" + onyma. Synonymous is attested from 1610. from L. of synonymos "having the same name as. Aeolic dialectal form of onoma "name" (see name). .
.Benefits of Neural Networks Antonym:1870. Anonymous:1601. from Gk. anonymos "without a name. Æolic dialectal form of onoma "name" (see name). instead of."without" + onyma. from Gk. opposite" (see anti) + onym "name" (see name)."equal to. anti. created to serve as opposite of synonym." from an.
. Benefits of Neural Networks (1) massively parallel distributed structure (2) the ability to generalize make it possible for neural networks to solve complex (largescale) problems that are currently intractable.e. they need to be integrated into a consistent system engineering approach. In practice. Rather. however.i. These two informationprocessing capabilities. neural networks cannot provide the solution working by themselves alone..
a complex problem of interest is decomposed into a number of relatively simple tasks. pattern recognition. however. 3. that we have a long way to go (if ever) before we can build a computer architecture that mimics the human brain.Benefits of Neural Networks 1. that match their inherent capabilities. 2. . and neural networks are assigned a subset of the tasks e..etc. Specifically. It is important to recognize. associative memory. control.g.
Nonlinearity A neuron is basically a nonlinear device.. the nonlinearity is of a special kind in the sense that it is distributed throughout the network. Consequently. is itself nonlinear. . made up of an interconnection of neurons. Moreover.g. particularly if the underlying physical mechanism responsible for the generation of an input signal (e. Nonlinearity is a highly important property.Properties and Capabilities of Neural Networks 1. speech signal) is inherently nonlinear. a neural network.
Each example consists of a unique input signal and the corresponding desired response. .Properties and Capabilities of Neural Networks 2. InputOutput Mapping A popular paradigm of learning called supervised learning involves the modification of the synaptic weights of a neural network by applying a set of labeled training samples or task examples.
and the synaptic weights(free parameters) of the network are modified so as to minimize the difference between the desired response and the actual response of the network .Properties and Capabilities of Neural Networks The network is presented an example picked at random from the set.
The previously applied training examples may be reapplied during the training session but in a different order. where there are no further significant changes in the synaptic weights.Properties and Capabilities of Neural Networks The training of the network is repeated for many examples in the set until the network reaches a steady state. .
Properties and Capabilities of Neural Networks Thus the network learns from the examples by constructing an inputoutput mapping for the problem at hand. .
1992).] . a mind not yet influenced by outside impressions and experiences) ([Medieval Latin tabula rāsa : Latin tabula. tablet + Latin rāsa.. from a biological viewpoint.Properties and Capabilities of Neural Networks Such an approach brings to mind the study of nonparametric statistical inference which is a branch of statistics dealing with modelfree estimation. or. feminine of rāsus. tabula rasa learning (Geman et al. erased. (tabula rasa: a smoothed or blank tablet.
and to do so without invoking a probabilistic distribution model. the requirement is to "estimate" arbitrary decision boundaries in the input signal space for the patternclassification task using a set of examples. . a pattern classification task. In a nonparametric approach to this problem. where the requirement is to assign an input signal representing a physical object or event to one of several prespecified categories (classes).Properties and Capabilities of Neural Networks Consider. for example.
which suggests a close analogy between the inputoutput mapping performed by a neural network and nonparametric statistical inference.Properties and Capabilities of Neural Networks A similar point of view is implicit in the supervised learning paradigm. .
2. paradigm:1. b. < LL paradīgma < Gk parádeigma pattern (verbid of paradeiknýnai to show side by side). suffix ] .a display in fixed arrangement of such a set.para1 + deik. equiv. [Origin: 1475–85. to para. the set of all inflected forms based on a single stem or theme. boys.an example serving as a model. boy's. esp. pattern. as boy.a set of forms all of which contain a particular element. base of deiknýnai to show (see deictic) + ma n. boys'.Grammar. a.
analogia "proportion. from Gk." from ana." also "word.Properties and Capabilities of Neural Networks analogy: 1550. from L. ."upon. reckoning. analogia. speech." A mathematical term used in a wider sense by Plato. according to" + logos "ratio.
. Neural networks have a builtin capability to adapt their synaptic weights to changes in the surrounding environment. In particular.Properties and Capabilities of Neural Networks 3. Adaptivity. a neural network trained to operate in a specific environment can be easily retrained to deal with minor changes in the operating environmental conditions.
and adaptive control.. when it is operating in a nonstationary environment (i. coupled with the adaptive capability of the network.Properties and Capabilities of Neural Networks Moreover.e. a neural network can be designed to change its synaptic weights in real time. . adaptive signal processing. make it an ideal tool for use in adaptive pattern classification. The natural architecture of a neural network for pattern classification. signal processing. and control applications. one whose statistics change with time).
assuming the adaptive system is stable.Properties and Capabilities of Neural Networks As a general rule. the more robust its performance will likely be when the system is required to operate in a nonstationary environment. . it may be said that the more adaptive we make a system in a properly designed fashion.
indeed. it may do the very opposite. an adaptive system with short time constants may change rapidly and therefore tend to respond to spurious disturbances. that adaptivity does not always lead to robustness. however. causing a drastic degradation in system performance. For example. .Properties and Capabilities of Neural Networks It should be emphasized.
Adaptivity (or “in situ. false) disturbances and yet short enough to respond to meaningful changes in the environment. the problem described here is referred to as the stabilityplasticity dilema (Grossberg.)in the original situation” training as it is sometimes referred to) is an open research topic. 1988). . the principal time constants of the system should be long enough for the system to ignore spurious (L.Properties and Capabilities of Neural Networks To realize the full benefits of adaptivity.(L. spurius.
Evidential Response In the context of pattern classification.Properties and Capabilities of Neural Networks 4. should they arise. a neural network can be designed to provide information not only about which particular pattern to select. This latter information may be used to reject ambiguous patterns. but also about the confidence in the decision made. and thereby improve the classification performance of the network. .
contextual information is dealt with naturally by a neural network. .Properties and Capabilities of Neural Networks 5. textum. Consequently. contextus. Contextual Information (L. Every neuron in the network is potentially affected by the global activity of all other neurons in the network. to weave) Knowledge is represented by the very structure and activation state of a neural network.contexerecon. texere.
the damage has to be extensive before the overall response of the network is degraded seriously. a neural network exhibits a graceful degradation in performance rather than catastrophic failure. if a neuron or its connecting links are damaged. recall of a stored pattern is impaired in quality. in principle. owing to the distributed nature of information in the network. However. For example. Fault Tolerance A neural network. 1992). .Properties and Capabilities of Neural Networks 6. Thus. has the potential to be inherently fault tolerant in the sense that its performance is degraded gracefully under adverse operating conditions (Bolt. implemented in hardware form.
VLSI Implementability The massively parallel nature of a neural network makes it potentially fast for the computation of certain tasks. which makes it possible to use a neural network as a tool for realtime applications involving pattern recognition. and control. This same feature makes a neural network ideally suited for implementation using veryIargescaleintegrated (VLSI) technology. . signal processing. 1980).Properties and Capabilities of Neural Networks 7. The particular virtue of VLSI is that it provides a means of capturing truly complex behavior in a highly hierarchical fashion (Mead and Conway.
This commonality makes it possible to share theories and learning algorithms in different applications of neural networks. This feature manifests itself in different ways: Neurons. Modular networks can be built through a seamless 8. Basically. . integration of modules.Properties and Capabilities of Neural Networks We say this in the sense that the same notation is used in all the domains involving the application of neural networks. in one form or another. represent an ingredient common to all neural networks. Uniformity of Analysis and Design. neural networks enjoy universality as information processors.
Properties and Capabilities of Neural Networks analysis: [Medieval Latin. analysis "a breaking up.Cite This Source .in IndoEuropean roots. Published by Houghton Mifflin Company. Fourth Edition Copyright © 2006 by Houghton Mifflin Company.L. from Greek analusis. from Gk.] (Download Now or Buy the Book) The American Heritage® Dictionary of the English Language. "resolution of anything complex into simple elements" (opposite of synthesis).Online Etymology Dictionary . to undo : ana. Phrase in the final (or last) analysis (1844)." from analyein "unloose."up. analysis . analysis. Psychological sense is from 1890. en dernière analyse. translates Fr. a dissolving." from ana. All rights reserved.Share This 1581. from analūein.+ lūein. from M. throughout. to loosen. see ana. throughout" + lysis "a loosening" (see lose). see leu.
pp.) in the fashion sense of "prestigious" is first recorded 1966. many modern uses of design are metaphoric extensions. designare "mark out. Designer (adj. designatus.Properties and Capabilities of Neural Networks Design: 1548. soon giving wide figurative extension to designated." from de. . devise. sign."out" + signare "to mark. from L. from L. Designing "scheming" is from 1671. designer drug is from 1983. Designated hitter introduced in American League baseball in 1973. of designare). with the meaning now attached to designate (1646." from signum "a mark." Originally in Eng.
which is a living proof that faulttolerant parallel processing is not only physically possible but also fast and powerful. Neurobiological Analogy The design of a neural network is motivated by analogy with the brain. . Neurobiologists look to (artificial) neural networks as a research tool for the interpretation of neurobiological phenomena.Properties and Capabilities of Neural Networks 9.
in IndoEuropean roots + motor. elevated convolutions on the surfaces of the cerebral hemispheres. rounded.] system (responsible for eye movements) and the manner in which they process signals (Robinson. circle.] ) )circuits in the oculomotor (1. or being the area of the cortex of the frontal lobe lying immediately in front of the motor area of the precentral gyrus(Any of the prominent. [Latin gȳrus. 2.Properties and Capabilities of Neural Networks For example.Of or relating to movements of the eyeball: an oculomotor muscle. 1992). eye. . see gyre. [Latin oculus. see okw. On the other hand.Of or relating to the oculomotor nerve. engineers look to neurobiology for new ideas to solve problems more complex than those based on conventional hardwired design techniques. neural networks have been used to provide insight on the development of premotor (relating to.
for example. The batinspired model consists of three stages: (1) a front end that mimics the inner ear of the bat in order to encode waveforms..Properties and Capabilities of Neural Networks Here. . we may mention the development of a model sonar receiver based on the bat (Simmons et aI. 1992). which is in turn used to estimate the time separation of echoes from multiple target glints. (2) a subsystem of delay lines that computes echo delays. (3) a subsystem that computes the spectrum of echoes.
. The neurobiological analogy is also useful in another important way: It provides a hope and belief (and.Properties and Capabilities of Neural Networks The motivation is to develop a new sonar receiver that is superior to one designed by conventional methods. an existence proof) that physical understanding of neurobiological structures could indeed influence the art of electronics and thus VLSI (Andreou. to a certain extent. 1992).
it seems appropriate that we take a brief look at the structural levels of organization in the brain. .Properties and Capabilities of Neural Networks With inspiration from neurobiological analogy in mind.
2 Structural Levels of Organization in the Brain The human nervous system may be viewed as a threestage system. Blockdiagram representation of nervous system .Properties and Capabilities of Neural Networks 1.(Arbib. 1987).
Properties and Capabilities of Neural Networks Central to the system is the brain. 2 The arrows pointing from right to left signify the presence of feedback in the system. Two sets of arrows are shown in this figure: 1 Those pointing from left to right indicate the forward transmission of informationbearing signals through the system. which continually receives information. and makes appropriate decisions. perceives it. . represented by the neural (nerve) net in this figure.
and different functions take place at lower and higher levels. . convert electrical impulses generated by the neural net into discernible responses as system outputs.Properties and Capabilities of Neural Networks The receptors in the figure convert stimuli from the human body or the external environment into electrical impulses that convey information to the neural net (brain). The effectors. on the other hand. . In the brain there are both smallscale and largescale anatomical organizations.
1992. Shepherd and Koch. .Properties and Capabilities of Neural Networks This figure shows a hierarchy of interwoven levels of organization that has emerged from the extensive work done on the analysis of local regions in the brain (Churchland and Sejnowski. 1990).
and then neurons.Properties and Capabilities of Neural Networks Proceeding upward from synapses that represent the most fundamental level and that depend on molecules and ions for their action. . we have neural microcircuits dendritic trees.
Properties and Capabilities of Neural Networks A neural microcircuit refers to an assembly of synapses organized into patterns of connectivity so as to produce a functional operation of interest. . A neural microcircuit may be likened to a silicon chip made up of an assembly of transistors.
and their fastest speed of operation is measured in milliseconds. contains several dendritic subunits.Properties and Capabilities of Neural Networks The smallest size of microcircuits is measured in micrometers (m). . The neural microcircuits are grouped to form dendritic subunits within the dendritic trees of individual neurons. about 100 m in size. The whole neuron.
these neural assemblies perform operations characteristic of a localized region in the brain.Properties and Capabilities of Neural Networks At the next level of complexity. we have local circuits (about 1 mm in size) made up of neurons with similar or different properties. .
which involve multiple regions located in different parts of the brain. columns.Properties and Capabilities of Neural Networks This is followed by interregional circuits made up of pathways. . and topographic maps.
as in the superior colliculus. . and other interregional circuits mediate specific types of behavior in the central nervous system . where the visual. auditory. the topographic maps.Properties and Capabilities of Neural Networks Topographic maps are organized to respond to incoming sensory information. and somatosensory maps are stacked in adjacent layers in such a way that stimuli from corresponding points in space lie above each other. These maps are often arranged in sheets. Finally.
we are inching our . Nevertheless. They are nowhere to be found in a way toward a hierarchy of computational levels similar to that described in the last figure. digital computer. and we are nowhere close to realizing them with artificial neural networks.Properties and Capabilities of Neural Networks It is important to recognize that the structural levels of organization described herein are a unique characteristic of the brain.
Properties and Capabilities of Neural Networks The artificial neurons we use to build our neural networks are truly primitive in comparison to those found in the brain. The neural networks we are presently able to design are just as primitive compared to the local circuits and the interregional circuits in the brain. .
. however. it is for certain that in another 10 years our understanding of artificial neural networks will be much more sophisticated than it is today.Properties and Capabilities of Neural Networks What is really satisfying. is the remarkable progress that we have made on so many fronts during the past 20 years. With the neurobiological analogy as the source of inspiration. and the wealth of theoretical and technological tools that we are bringing together.
We begin the study by describing the models of (artificial) neurons that form the basis of the neural networks considered in these lectures. .Properties and Capabilities of Neural Networks Our primary interest here is confined to the study of artificial neural networks from an engineering perspective. to which we refer simply as neural networks.
The figure on the next slide shows the model for a neuron. .Models of a Neuron Models of a Neuron A neuron is an informationprocessing unit that is fundamental to the operation of a neural network.
Models of a Neuron Nonlinear model of a neuron .
. a signal xj at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. Specifically.Models of a Neuron 1. A set of synapses or connecting links. It is important to make a note of the manner in which the subscripts of the synaptic weight wkj are written. each of which is characterized by a weight or strength of its own.
Models of a Neuron The first subscript refers to the neuron in question and the second subscript refers to the input end of the synapse to which the weight refers. The weight wkj is positive if the associated synapse is excitatory. to hold.in IndoEuropean roots. from Latin inhibēre. forbid : in. see ghabh. it is negative if the synapse is inhibitory (Middle English inhibiten. in. to restrain. see in2 + habēre. to forbid. inhibit.] ) . the reverse of this notation is also used in the literature.
Models of a Neuron 2.An adder for summing the input signals. weighted by the respective synapses of the neuron. the operations described here constitute a linear combiner. .
Models of a Neuron 3.1]. The activation function.An activation function for limiting the amplitude of the output of a neuron. . Typically. is also referred to in the literature as a squashing function in that it squashes (limits) the permissible amplitude range of the output signal to some finite value. the normalized amplitude range of the output of a neuron is written as the closed unit interval [0.1] or alternatively [1.
The model of a neuron also includes an externally applied threshold k that has the effect of lowering the net input of the activation function.Models of a Neuron 4. the bias is the negative of the threshold. the net input of the activation function may be increased by employing a bias term rather than a threshold. On the other hand. .
Models of a Neuron In mathematical terms. is the activation function. and yk is the output signal of the neuron. wkj’s are the synaptic weights of neuron k. k is the threshold. uk is the linear combiner output. we may describe neuron k by writing the following pair of equations: p Mathematical Model of a Neuron uk wkj x j j 1 yk ( uk k ) where xj’s are the input signals. .
Models of a Neuron BlockDiagram Representation of a Neuron p uk wkj x j j 1 yk ( u k k ) .
Models of a Neuron The use of threshold k has the effect of applying an affine transformation to the output uk of the linear combiner in the model of the figure. as shown by vk u k k .
the graph of vk versus uk no longer passes through the origin. Note that as a result of this affine transformation. the relationship between the effective internal activity level or activation potential vk of neuron k and the linear combiner output uk is modified in the manner illustrated in the figure. . depending on whether the threshold k is positive or negative.Models of a Neuron In particular.
We may account for its presence as in the above equation. Equivalently. we may formulate the combination of the two equations as follows: vk wkj x j j 0 p yk (vk ) .Models of a Neuron The threshold k is an external parameter of artificial neuron k.
Models of a Neuron Here we have added a new synapse. whose input is x0 1 and whose weight is wk 0 k .
where the effect of the threshold is represented by doing two things: .Models of a Neuron We may therefore reformulate the model of neuron k as in the figure.
Models of a Neuron (1) adding a new input signal fixed at 1. Alternatively. we may model the neuron as in the following slide: . and (2) adding a new synaptic weight equal to the threshold k .
they are mathematically equivalent. bias bk• .Y1 Models of a Neuron where the combination of fixed input xo = + 1 and weight wkO = bk accounts for the Although the models of the two figures are different in appearance.
15.Slayt 92 Y1 YTU.03.2005 .
Models of a Neuron SignalFlow Graph Representation of a Neuron p uk w j 1 kj xj yk ( uk k ) .
(b) Activation links. defined by a linear inputoutput relation. the node signal xj is multiplied by the synaptic weight wkj to produce the node signal vk . Specifically. defined in general by a nonlinear inputoutput relation.SignalFlow Graph Representation of a Neuron Two different types of links may be distinguished: (a) Synaptic links. This form of relationship is the nonlinear activation function given as Models of a Neuron (.) .
.Models of a Neuron The Activation Function The activation function. denoted by y (v ) defines the output y of a neuron in terms of the activity level at its input v.
Sigmoid Function 1.Models of a Neuron We may identify three basic types of activation functions: Threshold Function 2. Piecewiselinear Function 3. .
Threshold (hard limiter or binary activation ) Function (leading to discrete perceptron) (v ) 1 1 1 (v) sgn(v) 2 2 v 0 (a) Unipolar .Models of a Neuron 1.
Models of a Neuron (v ) 1 (v) sgn(v) v 0 0 0 1 (b) Bipolar .
5 0 0 0 1 1 1 1 (v ) v v 2 2 2 2 v (a) Unipolar .Models of a Neuron 2. Piecewiselinear Function (v ) 1 0.5 0.
y ij (t ) f ( xij ) 1 xij (t ) 1 x ij (t ) 1 2 Models of a Neuron (v ) 1 1 (v ) v 1 v 1 2 0 1 1 v 1 (b) Bipolar .
Models of a Neuron 3.5 1 (v ) .a 0 av 1 e v (a) Unipolar . Sigmoid Function (v ) 1 0.
Models of a Neuron (v ) 1 v 1 e av 2 = 1 . a 0 (v ) av av 1 e 1 e 1 (b) Bipolar .
1992. ARTIFICIAL NEURAL SYSTEMS. through weights. to all other neurons including themselves.Models of Artificial Neural Networks DEFINITION OF Neural Network (Jacek M. . Zurada. West Publishing Company) A Neural Network is an interconnection of neurons such that neuron outputs are connected. both lagfree and delay connections are allowed.
Models of Artificial Neural Networks Neural Networks Viewed as Directed Graphs 1. 2. . BlockDiagram Representation (BDR) SignalFlow Graph Representation (SFGR) These are obtained when BDR and SFGR for the neurons are used.
NEURAL NETWORKS. . and which is characterized by four properties: Each neuron is represented by a set of linear synaptic links. an externally applied threshold. and a nonlinear activation link. The threshold is represented by a synaptic link with an input signal fixed at a value of 1. 1994.Models of Artificial Neural Networks An alternative definition of Neural Network (Simon Haykin. Macmillan College Publishing Company) A neural network is a directed graph (SFG) consisting of nodes with interconnecting synaptic and activation links.
. The synaptic links of a neuron weight their respective input signals. 4. The activation link squashes the internal activity level of the neuron to produce an output that represents the output of the neuron. 3. The weighted sum of the input signals defines the total internal activity level of the neuron in question.Models of Artificial Neural Networks 2.
we may identify four different classes of network architectures: 1.Network Architectures In general. Multilayer Feedforward Networks 3. Recurrent Networks 4. Lattice Structures . SingleLayer Feedforward Networks 2.
but not vice versa. In the simplest form of a layered network. we just have an input layer of source nodes that projects onto an output layer of neurons (computation nodes). . SingleLayer Feedforward Networks A layered neural network is a network of neurons organized in the form of layers.Network Architectures 1.
It is illustrated on the following slide for the case of four nodes in both the input and output layers. with the designation "single layer" referring to the output layer of computation nodes (neurons).Network Architectures In other words. . this network is strictly of a feedforward type. In other words. Such a network is called a singlelayer network. we do not count the input layer of source nodes. because no computation is performed there.
Network Architectures .
.Network Architectures 2. The function of the hidden neurons is to intervene between the external input and the network output. Multilayer Feedforward Networks The second class of a feedforward neural network distinguishes itself by the presence of one or more hidden layers. whose computation nodes are correspondingly called hidden neurons or hidden units.
Network Architectures .
the network is enabled to extract higherorder statistics. . for (in a rather loose sense) the network acquires a global perspective despite its local connectivity by virtue of: the extra set of synaptic connections the extra dimension of neural interactions.Network Architectures By adding one or more hidden layers. The ability of hidden neurons to extract higherorder statistics is particularly valuable when the size of the input layer is large.
.Network Architectures The source nodes in the input layer of the network supply respective elements of the activation pattern (input vector). and so on for the rest of the network. The output signals of the second layer are used as inputs to the third layer. which constitute the input signals applied to the neurons (computation nodes) in the second layer (i. the first hidden layer).e. .
Network Architectures Typically. . The set of output signals of the neurons in the output (final) layer of the network constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input (first) layer. the neurons in each layer of the network have as their inputs the output signals of the preceding layer only.
. 4 hidden neurons. and 2 output neurons. For brevity this network is referred to as a 1042 network in that it has 10 source nodes.Network Architectures This graph illustrates the layout of a multilayer feedforward neural network for the case of a single hidden layer.
and q neurons in the output layer. is referred to as a ph1h2q network. .Network Architectures As another example. a feedforward network with p source nodes. h2 neurons in the second layer. h1 neurons in the first hidden layer. say.
.Network Architectures The neural network of this figure is said to be fully connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer.
some of the communication links (synaptic connections) are missing from the network. however. Each neuron in the hidden layer is connected to a local (partial) set of source nodes that lies in the immediate neighborhood. An example of such a network with a single hidden layer is presented on the next slide. . A form of partially connected multilayer feedforward network of particular interest is a locally connected network.Network Architectures If. we say that the network is partially connected.
Network Architectures Such a set of localized nodes feeding a neuron is said to constitute the receptive field of the neuron. Likewise. feedforward neural network . each neuron in the output layer is connected to a local set Partially connected of hidden neurons.
a recurrent network may consist of a single layer of neurons with each neuron feeding its output signal back to the inputs of all the other neurons. Recurrent (Feedback or Dynamical) Networks A recurrent neural network distinguishes itself from a feedforward neural network in that it has at least one feedback loop. selffeedback refers to a situation where the output of a neuron selffeedback loops and no is fedback to its own input. In the structure depicted in this figure there are no selffeedback loops in the Recurrent network with no network. as illustrated in the architectural graph of the figure on the right. hidden neurons . For example.Network Architectures 3.
Network Architectures The recurrent network illustrated on the previous slide also has no hidden neurons. The feedback connections shown originate from the hidden neurons as well as the output neurons. Recurrent network with hidden neurons . Here we illustrate another class of recurrent networks with hidden neurons.
has a profound impact on the learning capability of the network. . which result in a nonlinear dynamical behavior by virtue of the nonlinear nature of the neurons.Network Architectures The presence of feedback loops. and on its performance. the feedback loops involve the use of particular branches composed of unitdelay elements (denoted by z1). be it as in the recurrent structure with or without hidden neurons. Moreover.
or higherdimensional array of neurons with a corresponding set One dimensional lattice of 3 neurons of source nodes that supply the input signals to the array. Lattice (Multicategory Perceptron) Structures A lattice consists of a onedimensional.Network Architectures 4. Two dimensional lattice of 3by3 neurons . twodimensional. the dimension of the lattice refers to the number of the dimensions of the space in which the graph lies. A lattice network is really a feedforward network with the output neurons arranged in rows and columns.
patterns that lie on opposite sides of a hyperplane). Basically. it consists of a single neuron with adjustable synaptic weights and threshold.e.. as shown in the figures.The Perceptron The perceptron is the simplest form of a neural network used for the classification of a special type of patterns said to be linearly separable (i. .
then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes.The Perceptron The algorithm used to adjust the free parameters of this neural network first appeared in a learning procedure developed by Rosenblatt (1958. Indeed. 1962) for his perceptron brain model. . The proof of convergence of the algorithm is known as the perceptron convergence theorem. Rosenblatt proved that if the patterns (vectors) used to train the perceptron are drawn from two linearly separable classes.
the classes would have to be linearly separable for the perceptron to work properly. Such a perceptron is limited to performing pattern classification with only two classes. we may correspondingly form classification with more than two classes. By expanding the output (computation) layer of the perceptron to include more than one neuron. However. .The Perceptron The singlelayer perceptron depicted has a single neuron.
..….x2. xp to class C1..e. The decision rule for the classification is to assign the point represented by the inputs x1 .…. say.The Perceptron From this model we find that the linear combiner output (i.x2. xp into one of two classes. hard limiter input) is v wkj x j j 1 p The purpose of the perceptron is to classify the set of externally applied stimuli x1 . if the perceptron output y is + 1 and to class C2 if it is 1. C1 or C2..
it is customary to plot a map of the decision regions in the pdimensional signal space spanned by the p input variables x1 . there are two decision regions separated by a hyperplane defined by w j 1 p kj xj 0 . xp.The Perceptron To develop insight into the behavior of a pattern classifier.. In the case of an elementary perceptron.….x2.
we may use an errorcorrection rule known as the perceptron convergence algorithm. for which the decision boundary takes the form of a straight line called the decision line. For the adaptation. Note also that the effect of the threshold is merely to shift the decision line away from the origin.The Perceptron This is illustrated here for the case of two input variables xl and x2.x2) that lies above the decision line is assigned to class C1.. A point (x1. . The synaptic weights w1 w2..wp of the perceptron can be fixed or adapted on an iterationbyiteration basis. and a point (x1.x2) that lies below the decision line is assigned to class C2. .
We may thus define the (p + 1)by1 (augmented) input vector and the corresponding (augmented) weight vector as: x [ x1 x2 .. .. x p 1]t w [ w1 w2 .. which is equivalent to that of the previous figure. wp ]t .. the threshold is treated as a synaptic weight is connected to a fixed input equal to 1.The Perceptron We find it more convenient to work with the modified signal flow graph given here..... . In this second model.
. Points in that space corresponding to members of the pattern set are ntuple vectors x.The Perceptron Pattern Space Any pattern can be represented by a point in ndimensinal Euclidean space En called the pattern space.
5. 2) .5. x2 (0. 1) (1. 2) (2. 0) (0.The Perceptron Example 1: Consider the six patterns in two dimensional pattern space shown in the following figure.1) (1.0) x1 (1.
.5 1 . 2 : class 2 1 0 .The Perceptron Design a perceptron such that these are classified according to their membership in sets as follows : 2 1. : class 1 1 2 0 0 0.5 1 . .
2) .1) (1.0) x1 (1. 1) (1.The Perceptron One possible decision line is given by x2= 2x12 which is drawn in the following figure.5. 0) (0.5. x2 x = 2x 2 2 1 (0. 2) (2.
tomia: to cut .The Perceptron One decision surface for this line is obtained as: x3 2 x1 x2 2 x3 0 2 x1 x2 2 0 gives the points on the decision line x3 0 2 x1 x2 2 0 gives the part of the surface above the decision line x3 0 2 x1 x2 2 0 gives the part of the surface below the decision line Such a pattern classification can be performed by the following (discrete) perceptron (dichotomizer): dichotomize: to divide or separate into two parts dicha: in two.
The Perceptron x1 x2 1 2 1 2 + v sgn(v) y y sgn(2 x1 x2 2) .
the second category contains all the remaining points that do not belong to the first category. P2(1. 1). P7 . . P4(1. in threedimensional space is available. P1(l. 1). 1. The set consists of all vertices of a threedimensional cube as follows: {P0(l. 1. 1. 1. 1). P7(1. P3(1. P1. l). 1)} Elements of this set need to be classified into two categories The first category is defined as containing points with two or more positive ones.SingleLayer Feedforward Neural Network Example 2: Assume that a set of eight points.. . 1. P6(1. 1). P0. 1. l). P5(1. 1. l)..1.
the membership in the category can be established by the following calculation: 1.. Notice that for each point Pi (x1. 7. .. P5.SingleLayer Feedforward Neural Network Classification of points P3. . and P7 can be based on the summation of coordinate values for each point evaluated for category membership. x2.where i = 0. x3) . then category 1 If sgn( x1 x2 x3 ) 1. then category 2 . P6.
SingleLayer Feedforward Neural Network The neural network given below implements the above expression: .
SingleLayer Feedforward Neural Network The network above performs the threedimensional Cartesian space partitioning as illustrated below : .
We may also write g ( x1 . x2 ) 2 x1 x2 2 or x1 g ( x ) 2 x1 x2 2 where x = x2 .Discriminant Functions In Example 1 SingleLayer Feedforward Neural Network x3 2 x1 x2 2 can be viewed as a Discriminant Function.
. x2 ) 2 x1 x2 2 can also be viewed as the equation of a plane in 3D Euclidean space. On the other hand g ( x1 .SingleLayer Feedforward Neural Network g ( x1 . x2 ) 0 2 x1 x2 2 0 is the intersection line of the above plane with the xyplane.
SingleLayer Feedforward Neural Network Obviously: g ( x ) 0 2 x1 x2 2 0 gives the points on the decision line g ( x ) 0 2 x1 x2 2 0 gives the points on the plane above the decision line g ( x ) 0 2 x1 x2 2 0 gives the points on the plane below the decision line .
we can write .SingleLayer Feedforward Neural Network Since on the decision line we have g ( x1 . x2 ) g ( x1 . x2 ) dx1 dx2 0 x1 x2 where dx1 and dx2 are the increments given to x1 and x2 on the decision line. x2 ) 0 g ( x1 . x2 ) dg ( x1 .
x2 ) x dx1 1 and dr g ( x1 . x2 ) dx2 x 2 where g and dr are known to be the gradient vector (or normal vector) and the tangent vector.SingleLayer Feedforward Neural Network Now defining g ( x1 . . respectively. x2 ) g ( x1 .
one pointing toward the positive side.The gradient vector points toward the positive side of the decision line. However. x2 ) . x2 ) g ( x1 . and the other toward the negative side.2 g ( x1 . 1 g ( x1 . For the above example the gradient and normal vectors are given by: g ( x1 . x2 ) x 2 2 2 1 . there are two normal vectors. x2 ) 1 1 1 x 2 SingleLayer Feedforward Neural Network . 2 =1. 1.
In fact 2 is obtained from SingleLayer Feedforward Neural Network g ( x1 . x2 ) 0 . x2 ) 0 Note that 1 and 2 are the projections of the normal vectors on the xy plane of two intersecting planes whose intersection line is given by g ( x1 .
SingleLayer Feedforward Neural Network Although 1 and 2 are unique. x2 ) 0 Plane pairs can be built by appropriately augementing the 2D normal vectors 1 and 2 to 3D normal vectors which will be the normal vectors of the two intersecting planes. there are infinetely many plane pairs whose intersection line is given by g ( x1 . .
yielding 2 2 n1 1 .The 2D normal vectors are plane vectors given in the xy plane. 2 2 1 .2 1 1 SingleLayer Feedforward Neural Network These can be augmented to 3D by adding a third component. say 2. n2 1 2 2 .
The details of building the augmented vectors are shown below: g SingleLayer Feedforward Neural Network n1 2 1 2 x2 n2 1 0 1 2 Decision line x1 .
SingleLayer Feedforward Neural Network Note that 1 and 2 are the normal vectors of the plane that is perpendicular to the xy plane and intersects the xy plane at the decision line. On the other hand the vectors n1 and n2 are the normal vectors of the planes obtained by rotating the above perpendicular plane around the decision line by and . respectively. .
•x is the vector connecting any point on the plane to the origin. •x0 is the vector connecting a fixed point on the plane to the origin. .We can now determine the equations for the these planes by using the normal vectorpoint form of SingleLayer Feedforward Neural Network plane equation given as: t n ( x x0 ) 0 where: •n is the normal vector of the plane.
That is xx0 is a vector that lies on the plane.SingleLayer Feedforward Neural Network This means that xx0 represents the vector connecting all possible points x on the plane to fixed point x0 on the same plane. Now let us find the plane equations for the two normal vectors found above. .
SingleLayer Feedforward Neural Network Let x0 be the point (1.0) on the decision line.0. We can write: x1 1 2 1 2 1 2 x 0 0 g x 1 x 1 For n1 2 1 1 2 2 g 0 2 1 x1 1 2 1 2 1 2 x 0 0 g x 1 x 1 For n2 2 2 1 2 2 g 0 2 2 .
SingleLayer Feedforward Neural Network Because of the way g1(x) and g2(x) are built we can state the following: g1 ( x) g 2 ( x) 0 on the positive side of the decision line g 2 ( x) g1 ( x) 0 on the negative side of the decision line 2 2 1 g x 1 x 1 For n 1 g x 1 x 1 For n1 2 2 1 2 1 1 2 2 2 2 2 .
SingleLayer Feedforward Neural Network g n2 n1 g1 Decision Decisio line g2 x1 Decision line x2 .
1) (1.1) (1.0) (1.2) g1g2>0 g1g2>0 g1g2>0 g2g1>0 g2g1>0 g2g1>0 .0) Class 2 (0.5.2) (0.SingleLayer Feedforward Neural Network Now we can compute g1(x) and g2(x) for the selected patterns in Example 1. Class 1 (2.5.
such gi(x) functions will be called Discriminant Functions. We can conclude that: g1 ( x) g 2 ( x) for the patterns in Class 1 g 2 ( x) g1 ( x) for the patterns in Class 2 .SingleLayer Feedforward Neural Network Henceforth.
. Therefore the boundary line is the perpendicular bisector of a connecting line.SingleLayer Feedforward Neural Network Minimum Distance Classification The classification of two clusters is carried out in such a way that the boundary of these two clusters is drawn as a line perpendicular to and passing through the midpoint of the line connecting the center points of two clusters .
SingleLayer Feedforward Neural Pi Network Positive side xixj xi P0=(xi+xj)/2 Pj xj 0 Negative side .
Let the vector x and x0 represent any point on this and the point P0. respectively. Then the following must hold: ( xi x j ) ( x x 0 ) 0 t which can be written in the form 1 ( xi x j ) ( x ( xi x j )) 0 2 t .SingleLayer Feedforward Neural Network Now we will derive the equation of the boundary line.
SingleLayer Feedforward Neural Network and 1 t ( xi x j ) x ( xi x j ) ( xi x j ) 0 2 t or 2 1 2 ( xi x j ) x ( xi x j ) 0 2 t .
Now defining SingleLayer Feedforward Neural Network t 2 1 gij ( x ) ( xi x j ) x ( xi 2 xj ) 2 We have already seen that the boundary (decision) line can be taken as the intersection of two planes gi and gj . .
SingleLayer Feedforward Neural Network Therefore gij ( x ) gi ( x ) g j ( x ) where we have called gi (x) discriminant functions and shown that they are associated with plane equations. .
SingleLayer Feedforward Neural Network Now using the two equations above we obtain 2 1 2 ( xi x j ) x ( xi x j ) gi ( x ) g j ( x ) 2 t which can be used to make the following identification: 1 gi ( x ) xi x xi 2 t 2 1 g j ( x) x j x x j 2 t 2 .
n1 t Therefore we can make the identification: wi = x i 1 wi .n1 xi 2 2 .SingleLayer Feedforward Neural Network gi (x) can also be expressed as: gi ( x ) wi x wi .
….SingleLayer Feedforward Neural Network An alternative approach towards the construction of discriminant functions may be taken as follows: Let us assume that a minimum–distance classification is requried to classify patterns into R categories.. The Euclidean distance between an input pattern x and the point Pi is given by the norm of the vector xxi as: x xi ( x xi ) ( x xi ) t .2. i=1. Each of the classes is represented by its center point Pi ..R.
SingleLayer Feedforward Neural Network
A minimum–distance classifier computes the distance from a pattern of unknown classification to each of the center points Pi . Then the category number of the point that yields the minimum distance is assigned to the unknown pattern.
Squaring the above equation yields
x xi
2
1 t x x  2x x + x xi = x x  2(x x  xi xi ) > 0 2
t t i t i t t i
Since xxt is independent of i, this term is constant with respect to the categories. Therefore, in order to minimize the distance
SingleLayer Feedforward Neural Network
x xi
we need to maximize
1 t gi (x) = x x  xi xi 2
t i
which is called a discriminant function.
SingleLayer Feedforward Neural Network
Example 3: A linear minimumdistance classifier will be designed for the three points given as:
10 2 5 x1 , x2 , x3 2 5 5
It is also assumed that the index of each point (pattern)corresponds to its class number. The three points and the connecting lines constitute a triangle which is shown on the next slide:
SingleLayer Feedforward Neural Network
x2 P3(5,5) P1(10,2) 0
x1
P2(2,5)
SingleLayer Feedforward Neural Network
Now let us draw the circle passing through all three vertices of the triangle, the circumcircle. We can conclude that each boundary is a perpendicular bisector of the triangle. A perpendicular bisector of a triangle is a straight line passing through the midpoint of a side and being perpendicular to it, i.e. forming a right angle with it. The three perpendicular bisectors meet at a single point, the triangle's circumcenter; this point is the center of the circumcircle.
SingleLayer Feedforward Neural Network
x2 P3(5,5) P1(10,2) x1
0
P2(2,5)
Now using
SingleLayer Feedforward Neural Network
10 2 5 x1 , x2 , x3 2 5 5
and
1 t gij ( x ) ( xi x j ) x ( xi 2
we obtain
2
xj )
2
SingleLayer Feedforward Neural Network
1 2 2 g12 ( x ) ( x1 x2 ) x ( x1 x2 ) 2
t
10 2 x1 1 [(100 4) (4 25)] 2 5 x2 2 8 x1 7 x2 37.5
t
SingleLayer Feedforward Neural Network
1 2 2 g13 ( x ) ( x1 x3 ) x ( x1 x3 ) 2
t
10 5 2 5 15 x1 3 x2 27
t
x1 1 x 2 [(100 4) (25 25)] 2
SingleLayer Feedforward Neural Network 1 2 2 g 23 ( x ) ( x2 x3 ) x ( x2 x3 ) 2 t 2 5 x1 1 [(4 25) (25 25)] 5 5 x2 2 7 x1 10 x2 10.5 t .
w2 5 .5 25 .n1 xi 2 2 we obtain 10 2 5 w1 2 . 52 14. w3 5 .SingleLayer Feedforward Neural Network Now using wi = x i 1 wi .
n1 t we obtain g1 ( x ) 10 x1 2 x2 52 g 2 ( x ) 2 x1 5 x2 14.5 g3 ( x ) 5 x1 5 x2 25 .SingleLayer Feedforward Neural Network and using gi ( x ) wi x wi .
SingleLayer Feedforward Neural Network A block diagram producing the three discriminant functions is shown below: x1 x2 1 10 2 52 2 5 14.5 5 25 10 x1 2 x2 52 2 x1 5 x2 14.5 5 5 x1 5 x2 25 .
5 65 Class 2 [2 5]t 42 14.5 g3(x)=5x1+5x225 Class 1 [10 2]t 52 4. P2(2.The discriminant values for the three patterns P1(10.5) are shown in the table below: Input Discriminant g1(x)=10x1+2x252 g2(x)= 2x15x214.5) and P3(5.2).5 60 Class 3 [5 5]t 92 49.5 25 SingleLayer Feedforward Neural Network .
It will be shown later that the same is true for any three points P1. H3 provided that the decision regions are determined as shown above. Therefore using a maximum selector at the output will provide the required function from the network. H2.P3 taken from the three decision regions H1.P2 ..SingleLayer Feedforward Neural Network As required by the definition of the discriminant function. . the responses on the diagonal are the largest in each column.
Using the same network with TLUs (bipolar activation functions) will result in the outputs given in the table below: Input Class 1 [10 2]t sgn(g1(x)=5x1+3x25) 1 sgn(g2(x)= x22) sgn(g3(x)=9x1+x2) 1 1 Class 2 [2 5]t 1 1 1 Class 3 [5 5]t 1 1 1 SingleLayer Feedforward Neural Network .
SingleLayer Feedforward Neural Network The diagonal entries=1 The offdiagonal entries=1 However. . H3. as the next example will demonstrate this is not true for any three points P1.P2 . H2.P3 taken from the three decision regions H1.
5 50 Class 2 [0 1]t 50 19.5 g3(x)=5x1+5x225 Class 1 [5 0]t 2 4.0) are shown in the table below: Input Discriminant g1(x)=10x1+2x252 g2(x)= 2x15x214.5 5 . Q2(0.0).SingleLayer Feedforward Neural Network The response of the same network to the patterns Q1(5.5 20 Class 3 [4 0]t 92 22.1) and Q3(4.
SingleLayer Feedforward Neural Network The responses on the diagonal are still the largest in each column. However. using the same network with TLUs (bipolar activation functions) will result in the outputs given in the table on the next slide: .
SingleLayer Feedforward Neural Network Input Discriminant sgn(g1(x)=10x1+2x252) sgn(g2(x)= 2x15x214.5) sgn(g3(x)=5x1+5x225) Class 1 [5 0]t 1 1 1 Class 2 [0 1]t 1 1 1 Class 3 [4 0]t 1 1 1 .
The explanation of the responses on the diagonal being the largest in each column will now be made in detail. .SingleLayer Feedforward Neural Network It is therefore impossible to use TLUs once the decision lines are calculated using the minimumdistance calssification procedure. The only way out is using a maximum selector.
5 0 g3 5 x1 5 x2 25 0 .SingleLayer Feedforward Neural Network The discriminant functions determine the plane equations g1 10 x1 2 x2 52 0 g 2 2 x1 5 x2 14.
It is easily seen that: For any point in H1 : g1(x)>g2(x) and g1(x)>g3(x) For any point in H2 : g2(x)>g1(x) and g2(x)>g3(x) For any point in H3 : g3(x)>g1(x) and g3(x)>g2(x) .SingleLayer Feedforward Neural Network These planes are shown on the next slide.
SingleLayer Feedforward Neural Network 100 50 0 50 gi 100 150 200 10 5 0 5 x2 10 10 5 x1 0 5 10 .
. respectively.SingleLayer Feedforward Neural Network The decision regions H1.g2 and g3. H3 are projections of the planes g1. on the x1x2 plane and the decision lines are the projections of the intersection lines of the planes gi on the x1x2 plane which are shown on the next slide.H2.
5) g12 (x)=0 .337. g1 ( x) g 2 ( x) g1 ( x) g3 ( x) P123(2.5) g3 ( x) g1 ( x) g3 ( x) g 2 ( x) g13 (x)=0 H1.686) P1(10.2) 0 g 2 ( x) g1 ( x) x1 g23 (x)=0 H2 g 2 ( x) g3 ( x) P2(2..2.SingleLayer Feedforward Neural Network x 2 H3 P3(5.
SingleLayer Feedforward Neural Network A MATLAB plot of the projections of the intersection lines of the planes gi are shown on the next slide .
SingleLayer Feedforward Neural Network 30 20 10 0 10 20 30 30 20 10 0 10 20 30 .
The projections of the intersection lines of the planes gi on the x1x2 plane are shown to be given by the following line equations:
SingleLayer Feedforward Neural Network
g12 ( x ) 8 x1 7 x2 37.5 0 g13 ( x ) 15 x1 3 x2 27 0 g 23 ( x ) 7 x1 10 x2 10.5 0
The previous slide shows the segments that can be seen from the top.
SingleLayer Feedforward Neural Network
The continuation of the line g12=0 remains underneath the plane g3. The continuation of the line g23=0 remains underneath the plane g1. The continuation of the line g13=0 remains underneath the plane g2.
SingleLayer Feedforward Neural Network
A classifier using a maximum selector is shown on the next slide. The maximum selector selects the maximum discriminant and responds with the number of the discriminant having the largest value.
SingleLayer Feedforward Neural Network
x1 x2 1 10 2 52 2 5 14.5 5 25
g1(x)
1
Maximum i=1,2, or 3 2 selector
g2(x)
5
g3(x)
3
Classifier using the maximum selector
The classifier can be redrawn as follows: 10 x1 g1(x) 2 x2 1 52 1 2 x1 g2(x) Maximum i=1,2, or 3 5 2 selector x2 14.5 1 5 x1 g3(x) x2 5 3 25 1 Classifier using the maximum selector
SingleLayer Feedforward Neural Network
SingleLayer Feedforward Neural Network
x1 x2 1 x1 x2 1 x1 x2 1 10 2
g1(x)
52
g2(x)
1
Maximum i=1,2, or 3 2 selector
2 5 5
14.5
g3(x)
5
25
3
Classifier using the maximum selector
SingleLayer Feedforward Neural Network
In the above we have designed a classifier which was based on the minimumdistance classification for known clusters and derived the network with three perceptrons from the discriminant functions which were interpreted as plane equations. Instead, now let us consider the network on the next slide which is obtained as a result of training a network with three perceptrons using the same input patterns P1(10,2), P2(2,5) and P3(5,5) as in the previous network .
SingleLayer Feedforward Neural Network
x1 x2 1 5 3 5 0 1 2 9 0
5 x1 3 x2 5
TLU#1
x2 2
TLU#2 TLU#3
1
9x1 x2
SingleLayer Feedforward Neural Network
In fact gi(x)=0 define the intersection of gi planes with x1x2 plane. Therefore the TLU divides the gi planes into two regions: (1)the upperhalf plane which is above x1x2 plane and (1)the lowerhalf plane which is below x1x2 plane.
.The decision lines are obtained by setting gi(x)=0 SingleLayer Feedforward Neural Network 5 x1 3x2 5 0 x2 2 0 9 x1 x2 0 which are given on the next slide. The shaded areas are indecision regions which will become clear in the following discussion.
5) 15 1 g (5. 3) 19 1 g (1.3) P2(2. 5) 10 1 g (2.2) x1 g1 (4.9) g1 (0. 2) 4 1 2 g 2 (10. 3) 6 1 P1(10.5) 7 1 2 g 2 (5.9) 9 1 P3(5. 5) 3 1 2 g 2 (2. 2) 51 1 g (10.5) g1 (2.5) 0 9 x1 x2 0 g1 (10. 4) 3 1 g (4.9) 29 1 2 g 2 (0. 4) 40 1 Q2(4. 2) 88 1 x2 2 0 g1 (1. 5) 23 1 . 4) 2 1 2 g 2 (4. 3) 1 1 2 g 2 (1.4) Q3(1.SingleLayer Feedforward Neural Network 5 x1 3 x2 5 0 g1 (5.5) 50 1 x2 Q1(0.9) 22 1 g (0.
5) and P3(5.2). P2(2.5) are shown in the table below: Input Discriminant g1(x)=5x1+3x25 g2(x)= x22 g3(x)=9x1+x2 Class 1 [10 2]t 51 4 88 Class 2 [2 5]t 10 3 23 Class 3 [5 5]t 15 7 50 SingleLayer Feedforward Neural Network . g3(x) for the same three patterns P1(10.The discriminant values g1(x). g2(x).
The outputs of the network with three discrete perceptrons are shown in the table below: Input Class 1 [10 2]t 1 1 1 Class 2 [2 5]t 1 1 1 Class 3 [5 5]t 1 1 1 SingleLayer Feedforward Neural Network sgn(g1(x)=5x1+3x25) sgn(g2(x)= x22) sgn(g3(x)=9x1+x2) .
The table on the previous slide shows that the new discriminant functions g1 ( x ) 5 x1 3 x2 5 g 2 ( x ) x2 2 g3 ( x ) 9 x1 x2 SingleLayer Feedforward Neural Network classify the paterns P1(10.5 g3 ( x ) 5 x1 5 x2 25 .5) and P3(5. P2(2.2).5) in the same way as the discriminant functions g1 ( x ) 10 x1 2 x2 52 g 2 ( x ) 2 x1 5 x2 14.
SingleLayer Feedforward Neural Network Conclusion: The network.5) class 3 .5) in exactly the same way.2) class 1 P2 (2. and the network obtained using the maximumdistance classification procedure have classified the three points P1(10..5) and P3(5. which is obtained through the perceptron learning algorithm.2).5) class 2 P3 (5. i.e. P2(2. P1 (10.
3) which fall into shaded areas. Q2(4.SingleLayer Feedforward Neural Network Now consider the patterns Q1(0.4) and Q3(1.9). The discriminant values for these patterns are shown in the table on the next slide: .
SingleLayer Feedforward Neural Network Input Discriminant g1(x)=5x1+3x25 g2(x)= x22 g3(x)=9x1+x2 [0 9]t 22 29 9 [4 4]t 3 2 40 [1 3]t 19 1 6 .
3) g1 (4.9) g3 (0. g 2 (0.9) g3 ( 1.Since SingleLayer Feedforward Neural Network g1 (0.9).9) and Q 2 (4. 3) class 3 .4)>g 2 (4.4) class 1 Q3 (1.4) if we use a maximum selector instead of the three TLUs the network can decide that Q1 (0.4).g3 (4. 3)>g 2 (1. 3). g1 (1.
if we use TLUs we would obtain the outputs in the following table: Input Discriminant g1(x)=5x1+3x25 g2(x)= x22 g3(x)=9x1+x2 [0 9]t 1 1 1 [4 4]t 1 1 1 [1 3]t 1 1 1 .SingleLayer Feedforward Neural Network On the other hand.
Therefore according to the table obtained non of the three patterns Q1(0.4) and Q3(1. Therefore according to the network with TLUs the shaded areas will be called indecision regions. .SingleLayer Feedforward Neural Network In order to make a classification we should have a column with one 1 and two 1s.3) could be classified into any class.9). Q2(4.
Now let us consider the planes defined by SingleLayer Feedforward Neural Network g1 5 x1 3 x2 5 g 2 x2 2 g3 9 x1 x2 which are plotted on the next slide: .
SingleLayer Feedforward Neural Network 100 50 0 gi 50 100 10 0 x2 10 10 8 6 4 x1 2 0 2 4 6 8 10 .
The projections of the intersection lines of the planes gi(x) on the x1x2 plane are given by
SingleLayer Feedforward Neural Network
g12 : 5 x1 4 x2 3 0 g 23 : 9 x1 2 x2 2 0 g13 : 14 x1 2 x2 5 0
The segments that can be seen from the top are plotted on the next slide.
SingleLayer Feedforward Neural Network
15 10
5
0
5
10
15 10
5
0
5
10
15
20
SingleLayer Feedforward Neural Network
The continuation of the line g12=0 remains underneath the plane g3. The continuation of the line g23=0 remains underneath the plane g1. The continuation of the line g13=0 remains underneath the plane g2.
SingleLayer Feedforward Neural Network
x1 x2 xj wmn wK1 wkj vk vm w11 v1 1
Neurons
y1 y2 yk
wk1 wK2
v2
2
k
xn
Input nodes
K
ym
Output nodes
SingleLayer Feedforward Neural Network
v1 w11 x1 w12 x2 ......... w1 j x j ...... w1J xJ
v2 w21 x1 w22 x2 ......... w2 j x j ...... w2 J xJ
y1 f ( v1 )
y2 f ( v2 )
vk wk1 x1 wk 2 x2 ......... wkj x j ...... wkJ xJ
vK wK 1 x1 wK 2 x2 ......... wKj x j ...... wKJ xJ
yk f ( vk )
y K f ( vK )
SingleLayer Feedforward Neural Network
v1 w11 v w 2 21 . . . . vK wK 1 w12 w22 . . wK 2 . . w1J x1 x . . w1J 2 . . . . . . . . . . wKJ xJ
v1 y1 f ( v1 ) y f ( v ) v2 2 2 . . Γ . . . . yK f ( vJ ) v J
v Wx
y Γ(v)
SingleLayer Feedforward Neural Network
y1 f (.) y 0 2 . . . . yK 0 0 . . f (.) . . . . . . 0 . . . . v1 v 2 . . . f (.) vJ 0 0 .
y Γ[Wx]
Example 1:
SingleLayer Feedforward Neural Network
x1 x x2 1
v1 5 3 5 x1 5 x1 3 x2 5 x3 v 0 1 2 x x 2 x 2 3 2 2 v3 9 1 0 1 9 x1 x2
y1 sgn(5 x1 3x2 5 x3 ) y sgn( x 2 x ) 2 3 2 y3 sgn(9 x1 x2 )
TwoLayer Feedforward Neural Network
Example 1: Design a neural network such that the network maps the shaded region of plane x1, x2 into y = 1, and it maps its complement into y = 1, where y is the output of the neural network. In summary, the network will provide the mapping of the entire x1, x2 plane into one of the two points ±1 on the real number axis.
Thus the input vector is given as: x1 x x 2 1 .TwoLayer Feedforward Neural Network Solution: The inputs to the neural network will be x1. x2 and the threshold value 1.
TwoLayer Feedforward Neural Network The boundaries of the shaded region are given by the equations: x1 1 0 x1 2 0 x2 0 x2 3 0 .
TwoLayer Feedforward Neural Network The shaded region satisfies the inequalities: x1 1 x1 2 x2 0 x2 3 or x1 1 0 x1 2 0 x2 0 x2 3 0 .
TwoLayer Feedforward Neural Network These inequalities may be implemented using four neurons: .
discrete perceptron is used.e. t .. i.TwoLayer Feedforward Neural Network The equations for the first layer are obtained as: v1 1 0 1 v 1 0 2 x1 2 x2 v3 0 1 0 1 v4 0 1 3 y sgn( x1 1) sgn( x1 2) sgn( x2 ) sgn( x2 3) where binary (threshold or hard limiter )activation function.
The halfplanes where the neurons' responses are positive (+ 1) have been marked with arrows pointing toward the positive response half plane.5) . Note that each of the neurons 1 through 4 divides the plane xl . second layer can be easily obtained as y sgn( y1 y2 y3 y4 3.x2 into two halfplanes. The response of the.TwoLayer Feedforward Neural Network Let us discuss the mapping performed by the first layer.
The resultant neural network TwoLayer Feedforward Neural Network .
.. . the threshold is treated as a synaptic weight connected to a fixed input equal to 1.. we find it more convenient to work with the modified signalflow graph model given here. xn 1]t w [ w1 w2 .. which is equivalent to that of the previous figure.. In this second model. We may thus define the (p + 1)by1 input vector and the corresponding weight vector as: x [ x1 x2 . wn ]t ...The Perceptron Training Algorithm For the development of the perceptron learning algorithm for a singlelayer perceptron. ..
x2(2).. . . xl(2). .The Perceptron Training Algorithm These vectors are respectively called the augmented input vector and the augmented weight vector. that belong to class C2. . For fixed n. Let Xl be the subset of training vectors xl(1). .. the equation wtx = 0. xz. xp. The union of Xl and X2 is the complete training set X. plotted in pdimensional space with coordinates xl.. defines a hyperplane as the decision surface between two different classes of inputs... that belong to class C1.. Suppose then the input variables of the singlelayer perceptron originate from two linearly separable classes that fall on the opposite sides of some hyperplane. and let X2 be the subset of training vectors x2(1).
if the two classes Cl and C2 are known to be linearly separable. These two classes are said to be linearly separable if a realizable setting of the weight vector w exists. then there exists a weight vector w such that we may state: .The Perceptron Training Algorithm Given the sets of vectors Xl and X2 to train the classifier. Conversely. the training process involves the adjustment of the weight vector w in such a way that the two classes Cl and C2 are separable.
The Perceptron Training Algorithm w x0 t for every input vector x belonging to class C1 w x0 t for every input vector x belonging to class C2 .
However. the training problem for the elementary ·perceptron is then to find a weight vector w such that the two inequalities above are satisfied.The Perceptron Training Algorithm Given the subsets of training vectors X1 and X2. until this is achieved in the itermediate steps we will have w x0 t t for some input vectors x belonging to class C2 for some input vector x belonging to class C1 w x0 .
and in the latter case wt x will be increased until wt x 0 is reached. .The Perceptron Training Algorithm In the former case therefore wt x will be reduced until wt x 0 is achieved. Here we will begin to examine neural network classifiers that derive their weights during the learning cycle.
called the training sequence. xp. . .The Perceptron Training Algorithm • The sample pattern vectors x1 .. are presented to the machine along with the correct response.. supervised learning. The classifier modifies its parameters by means of iterative. • The response is provided by the teacher and specifies the classification information 'for each mput vector. . x2.
. The classifier structure is usually adjusted after each incorrect response based on the error value generated.The Perceptron Training Algorithm The network learns from 'experience by comparing the targeted correct response with the actual response.
. Assuming that the desired response is provided.The Perceptron Training Algorithm Let us now look again at the dichotomizer introduced and defined earlier. We will develop a supervised training procedure for this twoclass linear classifier. the error signal is computed. The error information can be used to adapt the weights of the discrete perceptron.
This will make it possible to devise a meaningful training procedure for the dichotomizer under consideration. The decision surface equation in n+1 dimensional augmented pattern space is w x0 t .The Perceptron Training Algorithm First we examine the geometrical conditions in the augmented weight space.
.. Therefore the variables of the function f(wt(i )x) are x1..………. the components of the pattern vector. x2.xn+1 .The Perceptron Training Algorithm When the above equation is considered in the pattern space then it is written for fixed weights w(1).………. w(2).w(k).
.The Perceptron Training Algorithm x2 w (i ) x1 f(wt(i )x)=0 The normal vector w(i ) (weight wector) points toward the side of the pattern space for which wt(i )x > 0. called the positive side.
.x(p)..………. . Therefore the variables of the function f(wtx(i)) are w1.……….The Perceptron Training Algorithm When the above equation is considered in the weight space then it is written for fixed patterns x(1). the components of the weight vector. w2.wn+1 . x(2).
.The Perceptron Training Algorithm w2 f(wtx(i))=0 w1 x ( i) The normal vector x(i) (pattern vector) points toward the side of the weight space for which wtx(i) > 0. called the positive side.
wn1 ) wn1 In further discussion it will be understood that the normal vector will always point toward the side of the space for which wtx> 0........The Perceptron Training Algorithm f ( w1 ....... w2 ..... wn1 ) w1 x1 (i ) f ( w1 . of the hyperplane. w2 .. wn1 ) xn1 (i ) f ( w1 . or semispace. w2 .. wn1 ) x2 (i ) x( i ) t w2 f ( w x (i )) ∇ f ( w1 ...... called the positive side. . w2 .
The Perceptron Training Algorithm Decision hyperplane in augmented weight space for a five pattern set from two classes .
By labeling each decision boundary in the augmented weight space with an arrow pointing into the positive halfplane. we can easily find a region in the weight space that satisfies the linearly separable classification. .The Perceptron Training Algorithm Note that the vectors x(i) points toward the positive side of the decision hyperplanes wtx(i)= 0.
The Perceptron Training Algorithm To find the solution for weights. we will look for the intersection of the positive decision regions due to the prototypes of class 1·and of the negative decision regions due to the prototypes of class 2. .
The Perceptron Training Algorithm Inspection of the figure reveals that the intersection of the sets of weights yielding all five correct classifications of depicted patterns is in the shaded region of the second quadrant as shown in the figure above. .
. the weights need to be adjusted from the initial value located anywhere in the weight space. To accomplish this.The Perceptron Training Algorithm Let us now attempt to arrive iteratively at the weight vector w located in the shaded weight solution area.
. is based on an errorcorrection scheme. or network training.The Perceptron Training Algorithm This assumption is due to our ignorance of the weight solution region as well as weight initialization. The adjustment discussed.
The Perceptron Training Algorithm At this point we will introduce the Perceptron Learning (Traning) Rule (Algorithm). The perceptron learning rule is of central importance for supervised learning of neural networks. The weights are initialized at any values in this method .
.The Perceptron Training Algorithm A neuron is considered to be an adaptive element. Its weights are modifiable depending on the input signal it receives. its output value. and the associated teacher (supervisor) response.
. x(i ).The Perceptron Training Algorithm The weight vector is changed according to the following: w(i 1) w(i ) w(i ) where w(i ) cr w(i ). d (i ) x(i ) and • d(i) is the teacher’s (supervisor’s) signal • r is the learning signal •c is a positive number called the learning constant depending on the sign of r.
The Perceptron Training Algorithm Here we have used f ( w t x( i )) w1 x (i ) t ( w x( i )) 1 x2 (i ) t x( i ) w2 ∇ (w x( i )) = xn1 (i ) ( w t x( i )) w n 1 This reveals that the change in the weight vector is in the direction of steepest ascent (or descent)of wtx(i). .
In this case the learning signal is defined as: The Perceptron Training Algorithm r (i ) d (i ) y (i ) t where d(i) is the desired output signal and y(i) is the actual output signal for the input pattern x(i) given by: y (i ) sgn( w (i) x(i) ) t The weight adjustment is given by: w(i ) c[d (i ) sgn( w (i ) x (i ))] x (i ) .Perceptron Learning (Traning) Rule (Algorithm).
.e..class 2 is input : 1) y sgn( wt x) 1. the correction is in the direction of steepest decent and given as w(i) 2c x(i) .i.e. no correction d = 1.class 1 is input : 1) y =sgn( wt x) 1.e.The Perceptron Training Algorithm d = 1...i..e. the input is correctly classified r d y 1 (1) 0. the input is misclassified r d y 1 (1) 2. i.e. the correction is in the direction of steepest ascent and given as w(i) 2c x(i) 2) y =sgn( wt x) 1. i.e.i.i.. the input is misclassified r d y 1 (1) 2. no correction 2) y sgn(wt x) 1. the input is correctly classified r d y 1 1 0.
x(4)=2. x(2)= 0.5. x(3) =3.The Perceptron Training Algorithm EXAMPLE: The trained classifier should provide the following classification of four patterns x with known class membership d: d1= d3 = 1: class C1 d2 = d4 = 1: class C2 x(1)= 1. .
5 3 2 x(1) .The Perceptron Training Algorithm The augmented input vectors are given as: 1 0. x(2) . x(4) 1 1 1 x( x x(1) x . x(3) 1 .
However this is not true.75 With x (1) being the input . we obtain 1 wt (1) x(1) 2.5 1. Therefore a correction has to be made.75 0.75 0 1 and binary activation function (discrete perceptron) sgn( wt (1) x(1)) 1 Hence x(1) is classified as being in class C2.The Perceptron Training Algorithm Let us choose an arbitrary augmented weight vector of 2.5 w(1) 1. .
for that matter. one thing is certain. In order that we achieve this we must first find out if there is a direction in which the decrease or. let us consider the surface given by : . Since sgn{wt(1)x(1)}=1. To show this.however.The Perceptron Training Algorithm The question to be asked at this point is: How do we make this correction? The answer depends on which training algorithm used. increase takes place. is that the correction should me made in such a way that wx t increases.
w2 . w2 or more succinctly z ... x.... namely. wn1 ) We can write: df ( w1 . w2 ....... w2 .... wn1 ) f ( w1 .. y .. dwn1 w1 w2 wn1 Let us now restrict ourselves to the case of 3 dimensions..... z . w2 ... w2 .. wn1 ) f ( w1 . wn1 ) f ( w1 ...... wn1 ) dw1 dw2 .. w1 ......The Perceptron Training Algorithm z f ( w1 .
then along these curves z f ( x. of the surface. y ) If the level curves are interpreted as contour lines of the landscape. y ) constant .The Perceptron Training Algorithm Now consider the surface z f ( x. i..e.
we obtain dz df ( x. y ) x y where dx and dy are the increments given to x and y on the level curve. y ) dx dy 0 df ( x .The Perceptron Training Algorithm consequently. y ) f ( x . y ) 0 hence f ( x . .
.respectively.The Perceptron Training Algorithm Now defining f ( x . y ) x dx f ( x . y ) f ( x . y ) and dr dy y where f and dr are known to be the gradient vector and the tangent vector.
y). y ) f dr 0 t This means that the gradient vector and the tangent vector are orthogonal vectors. Furthermore. Moreover it can be shown that the gradient vector points in the direction of steepest ascent of the function f(x.The Perceptron Training Algorithm we can write df ( x . . the gradient is the rate of climb in the direction of steepest ascent.
The Perceptron Training Algorithm Now consider the surface z f ( x. y ) ( x 50) ( y 50) 32 2 2 2 The following MATLAB program plots this surface: .
title('f(x.colormap(jet). for y=1:1:100.imshow(f. figure.^2+(y50). f(x.title('f(x. .y)=(x50).^21024.'notruesize').y)=x^2+y^21024').y)=x^2+y^21024').The Perceptron Training Algorithm close all clear all for x=1:1:100.[ ]. end end mesh(f). colorbar.
y)=(x50)2+(y50)2322 4000 z=f(x.y)=(x50)2+(y50)2322 3000 4000 3000 2000 1000 2000 1000 0 1000 2000 100 100 50 40 0 20 0 80 60 1000 0 2000 100 50 0 0 20 40 60 80 100 z=f(x.y)=(x50)2+(y50)2322 3500 3000 2500 2000 1500 1000 500 0 500 1000 .The Perceptron Training Algorithm z=f(x.
y C3 16384 C2 9216 C1 4096 x . y ) ( x 50) ( y 50) 32 Ci 2 2 2 where Ci are constants.The Perceptron Training Algorithm The level curves are obtained from z f ( x.
2 2( x 50) 0 2( y 50) 0 Q. y ) and dr f ( x.The Perceptron Training Algorithm f ( x.3 2( x 50) 0 2( y 50) 0 Q. y ) x 2( x 50) dx f ( x.4 2( x 50) 0 2( y 50) 0 .1 2( x 50) 0 2( y 50) 0 Q. y ) 2( y 50) dy y Considering the four quadrants of the circle: Q.
3 Q.1 Q.The Perceptron Training Algorithm the gradient vector points in directions as given below: Q.2 Q.4 .
The Perceptron Training Algorithm The fact that the gradient vector is orthogonal to the tangent vector proves that it is in the direction of steepest ascent or steepest descent.y). . Combining the two facts we can conclude that it points in the direction of steepest ascent. The directions found for the example show that the gradient vector points in the direction of ascent of the function f(x.
x( 2 ) 1 1 x2 ( 4 ) x2 ( 3 ) 1 x2 ( 2 ) 1 x2 ( 1 ) 1 In the weight space the following straight lines represent the decision lines: w1 1 w2 w1 w2 0 w2 w1 1 w1 0. x( 3 ) 1 . x( 4 ) 1 x( 1 ) 1 .5w1 1 3 w2 w1 3w2 0 w2 3w1 1 w1 w1 2 w2 w1 2 w2 0 w2 2 w1 1 .5w2 0 w2 0.The Perceptron Training Algorithm x ( 4 ) 2 x ( 3 ) 3 x ( 2 ) 0.5 w2 w1 0.5 x ( 1 ) 1 .
The Perceptron Training Algorithm Decision lines in weight space Initial weight vector w2 x(2) x(1) x(3) w1 x(4) 2 4 3 1 .
The Perceptron Training Algorithm The corresponding gradient vectors are computed as follows: .
5w1 w2 ( w x( 2 )) t ( w x( 2 )) x1( 2 ) 1 w 2 for x( 3 ) ( wt x( 3 )) x ( 3 ) 3 w1 t t 1 x( 3 ) w x( 3 ) 0.5 w1 t t 1 x( 2 ) w x( 1 ) 0.The Perceptron Training Algorithm for x( 1 ) ( wt x( 1 )) 1 w1 x1( 1 ) t t x( 1 ) w x( 1 ) w1 w2 ( w x( 1 )) t ( w x( 1 )) x1( 1 ) 1 w 2 for x( 2 ) ( wt x( 2 )) x ( 2 ) 0.5w1 w2 ( w x( 3 )) t ( w x( 3 )) x1( 3 ) 1 w 2 ( wt x( 4 )) x ( 4 ) 2 w1 t t 1 x( 4 ) w x( 4 ) 2 w1 w2 ( w x( 4 )) t ( w x( 4 )) x1( 4 ) 1 w 2 for x( 4 ) .
The Perceptron Training Algorithm wtx(3)<0 wtx(3)>0 w2 Initial weight vector Decision lines and gradient vectors in weight space wtx(2)>0 wtx(2)<0 2.75 x(4) x(2) x(1) x(3) w1 wtx(1)>0 2 wtx(4)>0 wtx(4)<0 wtx(1)<0 4 3 1 .5 w(1) 1.
. This means that at each step the correction is made according to the directive given by the supervisor as shown in the following figure. The Perceptron Training Algorithm This is a supervised learning algorithm.The Perceptron Training Algorithm Now we can concentrate on the particular training (or learning) algorithm (or rule).
The Perceptron Training Algorithm yi x di Weight learning rule: di is provided only in the case of supervised learning .
2.The Perceptron Training Algorithm Now consider r d i sgn( w x ) t Since d i 1 . 0 . sgn( w x ) 1 t r can take on one of the three values: 2.
sgn( w x ) 1 r 2 In d t t t for and d i 1. sgn( w x ) 1 r 2 d i 1.The Perceptron Training Algorithm for facti 1. sgn( w x ) 1 r 0 t for Sincei 1. sgn( w x ) 1 r 0 d Therefore we can define the correction rule in terms of the correction amount at the nth step as follows: Δwi ( n ) ( n )( d i ( n ) sgn( w ( n )x( n )))( w ( n )x( n )) t t .
The Perceptron Training Algorithm Δwi ( n ) ( n )( d i ( n ) sgn( w ( n )x( n )))x( n ) t yi di .
x( 3 ). x( 2 ). and x( 4 ) training set with respective class memberships the following four inequalities must hold: t w ( N )x( 1 ) 0 w ( N )x( 2 ) 0 t The Perceptron Training Algorithm d ( 1 ) 1. d ( 2 ) 1. and d ( 4 ) 1 wt ( N )x( 3 ) 0 wt ( N )x( 4 ) 0 where w(N) is the final weight vector that provides correct classification for the entire training set. .In order for the correct cllasification of the entire x( 1 ). d ( 3 ) 1.
. which is the shaded area in the following figure.The Perceptron Training Algorithm This means that after N1 training steps the weight vector w(N) ends up in the solution area.
75 x(4) x(2) x(1) x(3) w1 wtx(1)>0 2 wtx(4)>0 wtx(4)<0 wtx(1)<0 4 3 1 .The Perceptron Training Algorithm wtx(3)<0 wtx(3)>0 Weight Space wtx(2)>0 wtx(2)<0 w2 Initial weight vector 2.5 w(1) 1.
However. the original decision lines determined by the perceptron at each step are defined in the pattern space as this enables the classification to be easily seen. x(3) nd x(4). x(2). w(2) w(3) and w(4). This is achieved using the decision lines defined by x(1). The training has so far been shown in the weight space. The Perceptron Training Algorithm . In the following we show the correction steps of the weight vector as well as the corresponding decision surfaces in the pattern space. These decision lines are defined by w(1).
429 x1 x2 .75 x1 w (1) x 2.5 w(1) 1.75 x2 0 x2 1.75 2.5 x1 1.The Perceptron Training Algorithm In the pattern space w( 1 )x 0 determines the the decision line defined by the initial weight vector as t 2.5 1.
5 x1 1.75 x2 The Perceptron Training Algorithm which is the initial weight vector. As the gradient vector lies on the side of t w (1) x 2.The corresponding gradient vector is computed as follows: ( wt ( 1 )x ) w ( 1 ) 2.5 x1 x2 ( w ( 1 )x t ( w ( 1 )x ) w2 ( 1 ) 1.5 x1 1.75 x2 0 t .5 x1 t t 1 w( 1 ) w ( 1 )x 2.75 x2 0 where w (1) x 2.
e..The Perceptron Training Algorithm x(1) and x(3) have class 1 . However. . x(4) all are wrongly classified.e. x(3)..i. This means that x(1).i. d1= d3=1 and x(2) and x(4) have class 2 . x(2). d2= d4=1.
75 Pattern Space x wt ( 1 )x 2.5 w( 1 ) 1.5 x1 1.The Perceptron Training Algorithm Weight Space Initial weight vector 2.75 x2 0 x2 1.5 1.429 x1 x2 Initial decision line Weight vector is orthogonal to corresponding decision line .75 1 2.
429 x1 x2 Decision line for initial weight vector .75 x wt ( 1 )x 2.75 x2 0 x2 1.5 w(1) 1.5 1.5 x1 1.The Perceptron Training Algorithm Weight Space w2 Pattern Space x2 wt ( 1 ) x 0 w( 1 ) (is orthogonal todecion line) wt ( 1 ) x 0 x(2) Initial weight vector w1 x(4) x(1) x(3) x1 2.75 1 2.
The Perceptron Training Algorithm Pattern x(1) is input Weight Space Pattern Space Initial weight vector 2.5 1.5 w( 1 ) 1.75 1 2.75 x wt ( 1 )x 2.429 x1 x2 Initial decision line Weight vector is orthogonal to corresponding decision line Initial decision line wt x( 1 ) w1 1 w2 w1 w2 0 w2 w1 1 1 x( 1) 1 First input vector Decision line is orthogonal to corresponding input vector .75 x2 0 x2 1.5 x1 1.
5 1.5 x1 1.75 x2 0 x2 1.The Perceptron Training Algorithm Step 1:Pattern x(1) is input Weight Space Pattern Space w2 1 x2 wt ( 1 ) x 0 wt x( 1 ) 0 w( 1 ) wt ( 1 ) x 0 x(1) Initial weight vector x(2) w1 x(4) x(1) x(3) x1 2.75 1 2.75 wt x( 1 ) w1 wtx(1)>0 wtx(1)<0 1 w2 w1 w2 0 w2 w1 1 x wt ( 1 )x 2.5 w(1) 1.429 x1 x2 Line 1 is decision line .
5 1 1.5 w( 1 ) 1.429 x1 x2 1 y( 1 ) sgn( wt ( 1 )x( 1 )) sgn( 2.545 x1 x2 .75 1.5 1.Step 1 (Update 1): Pattern x(1) is input Weight Space Pattern Space Initial weight First input Initial decision line vector vector 2.75 Updated decision line x wt ( 2 )x 1.75 x2 0 x2 0.5 1.75 1 1.5 2.75 ) 1 1 d ( 1 ) y( 1 ) 1 ( 1 ) 2 Updated weight vector 2.5 w(2) w(1) x(1) 1 2.75 x2 0 x2 1.5 x1 2.75 1 2.5 x1 1.75 1 x( 1) 1 The Perceptron Training Algorithm x wt ( 1 )x 2.
75 wt x( 1 ) w1 wtx(1)>0 wtx(1)<0 1 w2 w1 w2 0 w2 w1 1 x wt ( 2 )x 1.75 w( 2 ) wt x( 1 ) 0 The Perceptron Training Algorithm wt ( 2 ) x 0 x(1) Initial weight vector x(2) w1 x(4) 2 x(1) wt ( 2 ) x 0 x(3) x1 2.5 2.545 x1 x2 Line 2 is decision line .75 1 1.75 x2 0 x2 0.5 w( 2 ) w2 x2 2.5 w(1) 1.5 x1 2.Step 1 (Update 1): :Pattern x(1) is input Weight Space Pattern Space 1.
75 1 1.75 0.5 1 w(3) w(2) x(2) 2.75 1 x1 1.75 x2 0 x2 0.5 0.5 x1 2.5 w( 2 ) 2.5 x( 2 ) 1 0.75 x2 0 x2 0. 5 y( 2 ) sgn( wt ( 2 )x( 2 )) sgn( 1.5 2.57 x1 x2 1.545 x1 x2 1.75 ) 1 1 d ( 2 ) y( 2 ) 1 1 ) 2 Second update Updated decision line x wt ( 3 )x 1 1.75 .5 2.75 1 1.The Perceptron Training Algorithm Pattern Space Weight Space Step 2 (Update 2): :Pattern x(2) is input Weight vector to be updated Second input vector Decision line to be updated x wt ( 2 )x 1.
5w1 1 Line 3 is decision line .5 w2 w1 0.The Perceptron Training Algorithm Step 2 (Update 2): :Pattern x(2) is input Weight Space Pattern Space w( 3 ) w2 x2 w( 3 ) wt x( 2 ) 0 wt ( 3 ) x 0 x(2) Initial weight vector wtx(2)>0 x(2) x(4) 3 x(1) wtx(2)<0 w1 x(3) w t ( 3 ) x 0 x1 w1 0 .5w2 0 w2 0.
75 1 x1 1.75 Step 3:Pattern x(3) is input 3 y( 3 ) sgn( w ( 3 )x( 3 )) sgn( 1 1.75 1 2 x1 2.57 x1 x2 1 w( 3 ) 1.75 ) 1 1 t d ( 3 ) y( 3 ) 1 ( 1 ) 2 1 3 2 w(4) w(3) x(3) 1 2.75 1.73 x1 x2 .75 x2 0 x2 0.The Perceptron Training Algorithm Pattern Space Weight Space Step 3 (Update 3): :Pattern x(3) is input Weight vector to be updated Third input vector 3 x( 3 ) 1 Decision line to be updated x wt ( 3 )x 1 1.75 Updated decision line x wt ( 4 )x 2 2.75 x2 0 x2 0.
The Perceptron Training Algorithm Step 3 (Update 3): Pattern x(3) is input Weight Space Pattern Space w2 x2 w(4) wt x( 3 ) 0 wtx(3)>0 Initial weight vector wt ( 4 ) x 0 x(3) x(4) w1 x(2) x(1) 4 x(3) x1 wtx(3)<0 wt ( 4 ) x 0 w1 3 w2 3w1 w2 0 w2 3w1 1 Line 4 is decision line .
75 1 2 x1 2.75 1 2 x1 2.73 x1 x2 .75 x2 0 x2 0.75 ) 1 1 t d ( 4 ) y( 4 ) 1 ( 1 ) 0 2 w(5) w(4) 2.75 2 x( 4 ) 1 Fourth input vector Decision line to be updated x wt ( 4 )x 2 2.73 x1 x2 Step 4:Pattern x(4) is input 2 y( 4 ) sgn( w ( 4 ) x( 4 )) sgn( 2 2.75 x2 0 x2 0.75 No update.The Perceptron Training Algorithm Pattern Space Weight Space Step 4 (Update 4): :Pattern x(4) is input Weight vector to be updated 2 w( 4 ) 2. same decision line x wt ( 4 )x 2 2.
The Perceptron Training Algorithm Step 4 (Update 4): Pattern x(4) is input Weight Space Pattern Space w2 x2 w(5) =w(4) wt x( 4 ) 0 wt ( 4 ) x 0 x(4) Initial weight vector x(2) w1 wtx(4)<0 x(4) wt ( 4 ) x 0 wtx(4)>0 w1 2 w2 2 w1 w2 0 w2 2 w1 1 x(3) x1 5 =4 Line 4 remains decision line x(1) .
The Perceptron Training Algorithm Step 5 (Update5): :Pattern x(1) is input Weight vector to be updated 2 w( 5 ) w( 4 ) 2.73 x1 x2 No update.75 First input vector 1 x( 1) 1 Decision line to be updated x wt ( 4 )x 2 2.73 x1 x2 Step 5:Pattern x(1) is input 1 y( 5 ) sgn( wt ( 5 ) x( 1 )) sgn( wt ( 4 )x( 1 )) sgn( 2 2.75 x2 0 x2 0.75 x wt ( 4 )x 2 2. same decision line .75 1 2 x1 2.75 ) 1 1 d ( 1 ) y( 5 ) 1 1 0 2 w(6) w(5) w(4) 2.75 x2 0 x2 0.75 1 2 x1 2.
The Perceptron Training Algorithm Step 5 (Update5): :Pattern x(1) is input Weight Space Pattern Space w2 x2 w(6) =w(5)= w(4) wt x( 1 ) 0 wt ( 4 ) x 0 Initial weight vector w1 wtx(1)>0 x(1) x(2) x(4) wt ( 4 ) x 0 wtx(1)<0 w1 1 w2 w1 w2 0 w2 w1 1 x(3) x1 6 5 4 Line 4 remains decision line x(1) .
75 1 .5 t x w( 7 ) w( 4 ) x( 2 ) w ( 7)x 2.75 1 2 x1 2.75 x2 0 x2 0.5 2.75 1 2.5 1.43 1 x x 1.75 2 0x2 1.75 1 0.73 x1 x2 0.75 ) 1 1 Step 6:Pattern x(2) is input d ( 2 ) y( 6 ) 1 1 ) 2 Updated decision line 2 0.5 2 x( 2 ) w( 6 ) w( 5 ) w( 4 ) 2.75 x2 2.The Perceptron Training Algorithm Step 6 (Update 6): :Pattern x(2) is input Weight vector to be updated Second input vector Decision line to be updated x wt ( 4 )x 2 2.5 y (6) sgn( wt (6) x(2)) sgn( wt (6) x(2)) sgn( 2 2.5x1 1.
The Perceptron Training Algorithm Step 6 (Update 6): :Pattern x(2) is input Weight Space Pattern Space w2 x2 w(7) wtx(3)>0 Initial weight vector wt x( 2 ) 0 w(7) x(2) x(1) x(3) x1 7 wt ( 7 ) x 0 wt ( 7 ) x 0 x(3) x(4) w1 wtx(3)<0 w1 2 w2 2 w1 w2 0 w2 2 w1 1 .
The Perceptron Training Algorithm Step 7 (Update 7): :Pattern x(3) is input Weight Space Pattern Space w2 x2 w(4) wt x( 3 ) 0 wt ( 3 ) x 0 wtx(3)>0 x(2) Initial weight vector x(2) w1 x(4) 3 x(1) x(3) w t ( 3 ) x 0 x1 wtx(3)<0 w1 3 w2 3w1 w2 0 w2 3w1 1 .
75 ) 1 1 .5 1.75 3 y (7) sgn( wt (7) x(3)) sgn( 2.5 w(8) w(7) 1.The Perceptron Training Algorithm Step 7:Pattern x3 is input d ( 3 ) y( 7 ) 1 1 0 2.
5 w(9) w(8) w(7) 1.75 .75 ) 1 1 t t Step 8:Pattern x4 is input d ( 4 ) y( 7 ) 1 ( 1 ) 0 2.The Perceptron Training Algorithm 2 y( 8 ) sgn( w ( 8 )x( 4 )) sgn( w ( 7 )x( 4 )) sgn( 2.5 1.
The Perceptron Training Algorithm 1 y( 9 ) sgn( w ( 9 )x( 1 )) sgn( w ( 7 )x( 1 )) sgn( 2.5 1.5 w(10) w(9) w(8) w(7) 1.75 .75 ) 1 1 t t Step 9:Pattern x(1) is input d ( 1 ) y( 9 ) 1 1 0 2.
75 ) 1 1 t t Step 10:Pattern x(2) is input d ( 2 ) y( 10 ) 1 1 2 2.75 .The Perceptron Training Algorithm 1 y( 10 ) sgn( w ( 10 )x( 2 )) sgn( w ( 7 )x( 1 )) sgn( 2.5 1.5 w(11) 1.
The Perceptron Training Algorithm .
5 1.429 x1 x2 1 w2 w1 w2 0 w2 w1 1 .The Perceptron Training Algorithm Weight Space w2 wt x( 1 ) 0 Pattern Space x2 wt ( 1 ) x 0 w( 1 ) wt ( 1 ) x 0 x(1) Initial weight vector x(2) w1 x(4) x(1) x(3) x1 2.5 x1 1.5 w(1) 1.75 wt x( 1 ) w1 wtx(1)>0 wtx(1)<0 x wt ( 1 )x 2.75 1 2.75 x2 0 x2 1.
w(10) w(9) w(8) w(7).w(11) only five are different . w(3) 1.w(11) obtained during the training algorithm are given below: 2.75 3 w(8) w(7). w(7) . out of the ten vectors w(2).75 .75 As can be seen from these vectors.75 2.75 . 1. 1. w(11) 0.The Perceptron Training Algorithm The initial weight vector w(1) and the weight vectors w(2). w(9) w(8) w(7).75 . w(6) w(5) w(4).5 1. w(2) 2. w(4) 2.5 w(5) w(4).5 1 2 w(1) . .
5 1 0.The Perceptron Training Algorithm These five vectors are given in the MATLAB plot below: 3 2.5 2 1.5 1 1.5 0 0.5 2 8 6 4 2 0 2 4 6 8 10 12 .
0.0) (1.0.0) class membership d=2.0.1) classifier is required to provide the (0.1.1.1) classification such that the yellow vertices (0.25 plane .0) x3=0.1. x2 x3 (1.0.0) membership d=1 and the blue vertices have (0.1) x1 (1.0.1) 0 of the cube have class (0.The Perceptron Training Algorithm Example: The trained (0.
The Perceptron Training Algorithm .
.P. 1 k . Step2: Weights are initialized at w at random small values. In the following n denotes the training step and p denotes the step counter within the training cycle.…………….2.The Perceptron Training Algorithm SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM Given are the p training pairs {x1. x2. where xi is (N+1) x 1 Di is 1 x 1. w is (N+1) x 1. Counters and error are initialized.1 p and 0 E .d2.. Step 1: c>0 is chosen. i=1.dp}.xp.d1.
The Perceptron Training Algorithm Step3: The training cycle begins here. d p d . Input is presented and output is computed: x p x. 1 E (d y )2 2 . y sgn( w x) t Step4: Weights are updated: 1 x p x. w → w c(d y ) x 2 Step5: Cycle error is computed.
. Output weights. For E=0. If E>0 then enter the new training cycle by going to Step 3. otherwise go to Step 7.The Perceptron Training Algorithm Step 6: If p<P then p p 1. Step 7: The training cycle is completed. k and E. terminate the training session. n n 1 and go to Step 3.
There are two main objectives of this: To define a continuous function of the weights as the error function so as to obtain finer control over the weights as well as over the whole training procedure. .SingleLayer Continuous Perceptron 1 2 Here the activation function is a continuous function of the weights instead of the signum. To enable the computation of the error gradient in order to be continuously in a position to know the direction in which the error decreases.
d(n) is the desired output signal and t y (n) f [ w (n) x(n)] is the actual output.the step number in the minimisation process.Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The Delta Training Rule is based on the minimisation of the error function which is given by 1 2 E( n ) ( d ( n ) y( n )) 2 where n is a positive integer representing the traning step number.e. i.. .
.wp)=E(w) which is minimised using an iterative minimisation method which computes the new values of the weights according to w(n 1) w(n) w(n) where w(n) is the increment given to the present weight vector w(n) to obtain the new weight vector w(n+1)..Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The error function (error surgface) is a function of the weights: E(w1... ..w2.
Therefore we take: where E(w(n)) is the gradient vector and is called the learning constant. we obtain ∇ w(n) E ( w(n)) w( n 1 ) w( n ) E( w( n )) .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Let us now use the steepest descent method for the minimisation of the error function E(w) where it is required that the weight changes be in the negative gradient direction. Using this in the equation above.
the components of the weight vector.Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule E( w( n )) E( n ) is the error surface at the n’th training step . Therefore the error to be minimised is: 1 E( n ) ( d ( n ) f ( wt ( n )x( n ))2 2 The independent variables for minimisation at each training step are wi. .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The error minimisation requires the computation of the gradient of the error function: E( w )w w( n ) 1 2 t ( d ( n ) f ( w x( n )) 2 w w( n ) .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The gradient vector is defined as: E w 1 E w2 E ( w ) . . E w p 1 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Using E( w )w w( n ) 1 t 2 ( d ( n ) f ( w x( n )) 2 w w( n ) and defining v( w ) w x t we obtain .
v( w ) w p 1 w w( n ) E( w )ww( n ) . dv .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule v( w ) w 1 v( w ) df ( v( w )) w2 d ( n ) f ( v( w )) .
Since Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule v( w) xi wi and f(v) y we can write .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule E ( w)w w( n ) and E ( w) df (v( w)) x i ( n) d ( n) y ( n) dv ww ( n ) wi ww( n ) df (v( w)) d ( n) y ( n) x ( n) dv ww ( n ) .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule If bipolar continuous activation function is used then we have: and 1 e f (v ) v 1 e v df (v) 2e v 2 dv (1 e ) v .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule In fact df (v) 2e v 2 dv 1 e v 1 e v 2 1 1 1 f 2 (v) 1 1 e v 2 2 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule E( w )w w( n ) 1 d ( n ) y( n )1 y 2 ( n ) x( n ) 2 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Conclusion: The delta training rule for the bipolar continuous perceptron is given as: 1 2 w( n 1 ) w( n ) d ( n ) y( n )1 y ( n ) x( n ) 2 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule If unipolar continuous activation function is used then we have: and 1 f (v ) v 1 e df (v) e v 2 dv (1 e ) v .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule we can write df (v) e v 1 1 ev 1 1 1 (1 ) v v v v v 2 1 e 1 e dv 1 e 1 e 1 e =f (v)(1 f (v)) .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Example:We will carry out the same training algorithm as in the previous example but this time using a continuous bipolar perceptron. The error at step n is given by: 1 1 e 1 2 d ( n ) 1 E ( n ) d ( n ) v( n ) v( n ) 2 1 e 2 1 e v( n ) 2 2 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule For the first pattern x(1)=[1 1]t . d(1)=1. The error at step 1 is given by: 1 2 E (1) d (1) 1 ( w1 w2 ) 2 1 e 1 2 2 1 2e 1 1 ( w1 w2 ) ( w1 w2 ) 2 1 e 2 1 e (1 e( w1 w2 ) ) 2 ( w1 w2 ) 2 2 2 .
5 w1 w2 ) ( 0.5 w1 w2 ) 2 1 e 1 2 1 2 2 1 1 ( 0.5 w1 w2 ) 2 1 e 2 1 e 1 e( 0.Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule For the second pattern x(2)=[0. d(2)=1. The error at step 2 is given by: 1 2 1 E ( 2 ) d ( 2 ) ( 0.5 1]t .5 w1 w2 ) 2 2 2 2 .
The error at step 3 is given by: 1 2 1 E ( 3 ) d ( 3 ) ( 3 w1 w2 ) 2 1 e 1 2 1 2e 2 1 1 ( 3 w1 w2 ) ( 3 w1 w2 ) 2 2 1 e 1 e 1 e( 3 w1 w2 ) ( 3 w1 w2 ) 2 2 2 2 .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule For the third pattern x(3)=[3 1]t . d(3)=1.
The error at step 4 is given by: 1 2 1 E ( 4 ) d ( 4 ) ( 2 w1 w2 ) 2 1 e 1 2 1 2 2 1 1 ( 2 w1 w2 ) ( 2 w1 w2 ) 2 2 1 e 1 e 1 e( 2 w1 w2 ) 2 2 2 2 .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule For the fourth pattern x(4)=[2 1]t . d(4)=1.
w2] = meshgrid(4:.1:4)./(1+Z3).mesh(E1).^2 figure. 4:.zlabel('E1(w1. xlabel('w1'). Z1 = exp(w1+w2). E1=2.w2)').title('Error surface for xt(1)=[1 1] and y=f(wt*x(1))'). E4=2. E3=2.mesh(Z2) Z3 = exp(3*w1+w2).mesh(Z4) subplot(2./(1+Z2)./(1+Z1).1).mesh(Z3) Z4 = exp(2*w1w2). E2=2.1:4./(1+Z4).^2 figure. [w1. Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule .5*w1w2).^2 figure.^2 mesh(Z1) Z2 = exp(.close all.ylabel('w2').2. clear all.
4).w2)+E3(w1. xlabel('w1').w2)').mesh(E3). xlabel('w1').title('Error surface for xt(4)=[2 1] and y=f(wt*x(4))').2. xlabel('w1').mesh(E2). figure.Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule subplot(2.zlabel('E2(w1.w2).2.w2).title('Total Error E(w1.colormap(jet).ylabel('w2').2).[].w2)=E1(w1.title('Total Error E(w1.'notruesize').w2)').2.zlabel('E4(w1.mesh(E).w2)+E2(w1.3).w2)=E1(w1.ylabel('w2').w2)+E2(w1.ylabel('w2') The error surfaces for the above four cases are shown in the next slide: .w2)+E4(w1. xlabel('w1').IMSHOW').5 1] and y=f(wt*x(2))').MESH').w2)+E3(w1.zlabel('E3(w1. xlabel('w1').w2)+E4(w1.ylabel('w2').mesh(E4).w2)').title('Error surface for xt(3)=[3 1] and y=f(wt*x(3))'). subplot(2. figure.title('Error surface for xt(2)=[. subplot(2.w2)').imshow(E.ylabel('w2').zlabel('E(w1. E = E1+E2+E3+E4.
w2) E2(w1.5 1] and y=f(wt*x(2)) 2 E1(w1.w2) 50 w2 2 1 0 0 50 w1 100 0 1 0 100 100 50 w2 0 0 50 w1 100 .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Error surface for xt(1)=[1 1] and y=f(wt*x(1)) Error surface for xt(2)=[.w2) E4(w1.w2) 100 w1 100 0 50 w2 1 0 0 50 2 1 0 100 50 w2 0 0 50 w1 100 Error surface for xt(3)=[3 1] and y=f(wt*x(3))Error surface for xt(4)=[2 1] and y=f(wt*x(4)) 2 E3(w1.
w2 ) The total error surface is shown in the next slide. w2 ) E4 ( w1 . w2 ) E1( w1 . w2 ) E3 ( w1 .Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The total error is defined by: E( w1 . . w2 ) E2 ( w1 .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule .
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule A contour map of the total error is depicted below: .
x(3) =3.Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The classifier training has been simulated for = 0. d1= d3 = 1: class C1 d2 = d4 = 1: class C2 .5. x(4)=2. including the one taken from x(1)= 1. x(2)= 0.5 for four arbitrarily chosen initial weight vectors.
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule The resulting trajectories of 150 simulated training steps are shown in the following figure (each tenth step is shown).
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule In each case the weights converge during training toward the center of the solution region obtained for the discrete perceptron case given on the next slide, which coincides with the dark blue region in the contour map of the total error is depicted before and also shown on the next slide.
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
Given are the p training pairs {x1,d1, x2,d2,…………….,xp,dp}, where xi is (N+1) x 1 Di is 1 x 1, i=1,2, ,P. In the following n denotes the training step and p denotes the step counter within the training cycle. Step 1: >0, =1 and Emax>0 chosen. Step2: Weights are initialized at w at random small values, w is (N+1) x 1. Counters and error are initialized.
1 k ,1 p and 0 E
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Step3: The training cycle begins here. Input is presented and output is computed:
x p x, d p d , y f ( w x )
t
Step4: Weights are updated:
1 x p x, w w (d y )(1 y 2 ) x 2
Step5: Cycle error is computed.
1 E (d y )2 2
Training Rule for a SingleLayer Continuous Perceptron:The Delta Training Rule Step 6: If p<P then
p p 1, n n 1
and go to Step 3, otherwise go to Step 7. Step 7: The training cycle is completed. For E<Emax terminate the training session. Output weights, k and E. If E>Emax then enter the new training cycle by going to Step 3.
Delta Training Rule for MultiPerceptron Layer
x1 x2 w1j xj
jth column of nodes
w11 w12 w21
v1
1 2
y1 y2 yk
v2 wjk vk
wK1
k
w1J xJ 1
wKJ
wK2
vK
K
Neurons
yK
kth column of nodes
The above can be redrawn as:
Delta Training Rule for MultiPerceptron Layer
Delta Training Rule for MultiPerceptron Layer
v1 w11 x1 w12 x2 ......... w1 j x j ...... w1J xJ
v2 w21 x1 w22 x2 ......... w2 j x j ...... w2 J xJ
y1 f ( v1 ) y2 f ( v2 )
vl wl1 x1 wl 2 x2 ......... wlj x j ...... wlJ xJ
vK wK 1 x1 wK 2 x2 ......... wKj x j ...... wKJ xJ
yl f (vl )
y K f ( vK )
Γ . . . . . . . . . . . . . . . . .Delta Training Rule for MultiPerceptron Layer v1 w11 v w 2 21 . wKJ xJ v1 y1 f ( v1 ) y f ( v ) v2 2 2 . . vK wK 1 w12 w22 . yK f ( vJ ) v J v Wx y Γ(v) . . w1J 2 . w1J x1 x . . wK 2 .
. . f (. f (. y Γ[Wx] .) . . . .Delta Training Rule for MultiPerceptron Layer y1 f (. . 0 . . . yK 0 0 . . v1 v 2 .) vJ 0 0 . . .) y 0 2 . . . .
Delta Training Rule for MultiPerceptron Layer The desired and actual output vectors at the nth training step are given as: d1 (n) d ( n) 2 d . . . y K ( n) The error expression for a single perceptron was given as: 1 2 E( n ) ( d ( n ) y( n )) 2 . d K ( n) y1 (n) y ( n) 2 y .
..Delta Training Rule for MultiPerceptron Layer which can be generalised to include all squared errors at the outputs k=1..K 2 1 K 1 2 E( n ) ( d k ( n ) yk ( n )) d( n ) y( n ) 2 k 1 2 where n represents the nth step which corresponds to a specific input pattern that produces the output error....2. .
Delta Training Rule for MultiPerceptron Layer The updated weight value from input j to neuron k at step n is given by: wkj ( n 1 ) wkj ( n ) Δwkj ( n ) According to the delta training rule for continuous perceptron E Δwkj ( n ) wkj w k 1..... J kj wkj ( n ) . K for j 1...2 ..2 ..
wkJ xJ we have vk xj wkj ......... wkj x j ...where Delta Training Rule for MultiPerceptron Layer E E vk E E (v( w)) w v w k kj kj Using vk wk1 x1 wk 2 x2 ....
Delta Training Rule for MultiPerceptron Layer The error signal term produced by the kth neuron is defined as: yk Using this yields E vk E yk x j wkj .
On the other hand we can write: Delta Training Rule for MultiPerceptron Layer yk Since E E yk vk yk vk 2 1 K 1 2 E( n ) ( d k ( n ) yk ( n )) d( n ) y( n ) 2 k 1 2 we get E ( d k yk ) yk .
Delta Training Rule for MultiPerceptron Layer On the other hand using yk f ( vk ) vk vk yields yk E E yk f ( vk ) ( d k yk ) vk yk vk vk .
Delta Training Rule for MultiPerceptron Layer which is used to obtain f ( vk ) Δwkj ( n ) ( d k yk ) xj vk For bipolar continuous activation function we already know that f ( vk ) 1 1 2 2 1 ( f ( vk )) 1 yk vk 2 2 .
Delta Training Rule for MultiPerceptron Layer Hence 1 2 Δwkj ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n )) )x j 2 and 1 wkj ( n 1 ) wkj ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n ))2 )x j ( n ) 2 .
x1( n ) . . . . wKJ ( n ) yK ( n ) . .where Delta Training Rule for MultiPerceptron Layer 2 yk ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n )) ) we can write y1 ( n ) w11( n 1 ) . xJ ( n ) . . wKJ ( n 1 ) wK 1( n ) . wK 1( n 1 ) . w1J ( n ) . w1J ( n 1 ) w11( n ) .
x ( n) xJ ( n) 1 (n) .Delta Training Rule for MultiPerceptron Layer Now defining x1 (n) . ( n) J (n) W (n 1) W (n) δy x t .
Layer Perceptron z1 z2 . t j1 zi . zI k ith column of nodes jth column of nodes (hidden layer) uJ xJ K yK kth column of nodes . .Generalised Delta Training Rule for Multi. . uj xj tIJ wkJ t11 u1 x1 x2 wK2 wK1 wkj vk vK w11 wk1 v1 v2 1 2 y1 y2 yk t21 u2 .
.Layer Perceptron E Δt ji ( n ) t ji where t ji t ji ( n ) j 1.2 ... I E E u j t ji u j t ji .2 .The weight adjustment for the hidden layer according to the gradient descent method will be: Generalised Delta Training Rule for Multi.... J for i 1...
2......Generalised Delta Training Rule for Multi.2 . This term is produced by the jth neuron of the hidden layer.....J..Layer Perceptron Here E xj for u j j 1..... On the other hand.. where j=1. using u j t j1 z1 t j 2 z2 .. t jI z I ..... J is the error signal term of the hidden layer with output x.....
Layer Perceptron u j we can calculate as t ji u j Therefore t ji zi E E u j xj zi t ji u j t ji .Generalised Delta Training Rule for Multi.
Layer Perceptron and Δt ji xj zi xj f ( uj ) E E x j xj u j x j u j Since .Generalised Delta Training Rule for Multi.
Generalised Delta Training Rule for Multi.Layer Perceptron E 1 2 d k f ( vk ) x j x j 2 k 1 K and x j u j f ( u j ) u j .
. wkj x j .....Generalised Delta Training Rule for Multi.Layer Perceptron E f ( vk ) ( d k f ( vk ) x j x j k 1 K f ( vk ) vk ( d k yk ) vk x j k 1 K Now using vk wk1 x1 wk 2 x2 ........ wkJ xJ ..
Layer Perceptron we have vk wkj x j Now using this equality and yk E f ( vk ) ( d k yk ) vk vk .Generalised Delta Training Rule for Multi.
Generalised Delta Training Rule for Multi.Layer Perceptron in E f ( vk ) ( d k f ( vk ) x j x j k 1 K f ( vk ) vk ( d k yk ) vk x j k 1 K we obtain .
Layer Perceptron E yk wkj x j k 1 K vk wkj in Now using this and x j E E x j xj u j x j u j .Generalised Delta Training Rule for Multi.
we obtain Generalised Delta Training Rule for Multi.Layer Perceptron xj Now using f ( u j ) u j k 1 K yk wkj Δt ji xj zi f ( u j ) u j zi yk wkj k 1 K we get Δt ji .
2 .Layer Perceptron f ( u j ) t ji ( n 1 ) t ji ( n ) yk wkj zi k 1 u j for j 1.. J K i 1....Generalised Delta Training Rule for Multi.. I ..2 ...
. . w1J .Generalised Delta Training Rule for Multi. . . wKJ as wj . . wK 2 . . wK 1 w12 w22 . . . . .Layer Perceptron Now defining the jth column of the matrix w11 w 21 W . w1J . .
yK we can write k 1 K yk wkj w δy t j .Generalised Delta Training Rule for Multi.Layer Perceptron and using y1 δy .
J...Layer Perceptron In the case of bipolar activation function we obtain for the hidden layer f ( u j ) u j 1 2 f (1 x j ) 2 ' xj Now construct a vector whose entries are the above terms for j=1..2.. i.e.Generalised Delta Training Rule for Multi. ..
Layer Perceptron 1 2 ( 1 x1 ) ' f x1 2 ' 1 f x 2 ( 1 x2 2 ) ' 2 fx f ' 1 xJ ( 1 x 2 ) J 2 .Generalised Delta Training Rule for Multi.
zI We then have ' t ' yk wkj f j zi ( w j δy ) f x z k 1 K .Layer Perceptron z1 z 2 z . .and define Generalised Delta Training Rule for Multi.
. . .Layer Perceptron Now defining δx w δy f t j ' x and t11 t12 t 21 t22 T . t1I . . . . . t JI . . t J 1 t J 2 . .Generalised Delta Training Rule for Multi. . . t1I . .
A similar formula was given for updating the output layer weights: W (n 1) W (n) δy x t .Generalised Delta Training Rule for Multi.Layer Perceptron we finally obtain T ( n 1 ) T ( n ) δx z t This updating formula is called the Generalised Delta Rule for adjusting the hidden layer weights.
this is not the case with x .Layer Perceptron Here the main difference is in computing the error signals y and x.Generalised Delta Training Rule for Multi. However. the entries of y are given as yk f ( vk ) ( d k yk ) vk which only contain terms belonging to the output layer. . In fact.
Here we can draw the following conclusion: The Generalised Delta Learning Rule propagates the error back by one layer which is true for every layer.Layer Perceptron δx w δy f t j f x' ' x whose entries are weighted sum of error signals yk produced by the following layer.Generalised Delta Training Rule for Multi. .
2. and i = 1. P.1 of the hidden layer having outputs y is .Generalised Delta Training Rule for Multi. di is (K X 1). Note that the l'th component of each zi is of value 1 since input vectors have been augmented. .Layer Perceptron . Size J . .Summary of the Error BackPropagation Training Algorithm (EBPTA) Given are P training pairs where zi is (1 Xl)...
Error BackPropagation Training Algorithm (EBPTA) selected. since hidden layer outputs have also been augmented. y is (J X 1) and 0 is (K Xl). Note that the J'th component of y is of value 1. .
Weights W and V are initialized at small random values. Step 2: Training step starts here (See Note 1 at end of list.0 .1. q f. E f.3a) is used]: .1. W is (K X J). p f. Emax chosen .Error BackPropagation Training Algorithm (EBPTA) Step 1: 'T/ > 0.) Input is presented and the layers' outputs computed [f(net) as in (2. V is (J X /).
K . and J .'2(dk . for k = 1. .f(w~y). . for k = 1. .zP' d f. 2. a column vector. J where ~'" a column vector. ok f.dp Yj f. is the j'th row of V.. ...f(vjz). is the k'th row of W.... . K where Wk.Error BackPropagation Training Algorithm (EBPTA) Z f. for j = 1. . Step 3: Error value is computed: 1 2 E f. . 2. 2.Ok) + E.
Vector 80 is (K Xl). (See Note 2 at end of list.Error BackPropagation Training Algorithm (EBPTA) Step 4: Error signal vectors 80 and 8y of both layers are computed. 8y is (J X 1).) The error signal terms of the output layer in this step are .
. ...Wkj + T/OokYj. ..Yj) L 0okWkj. 2. for k = 1.0k)(l .. K 2 The error signal terms of the hidden layer in this step are 12K 0yj = _ (1 . . .J ... K and j = 1.Error BackPropagation Training Algorithm (EBPTA) 1 2 Ook = (dk. 2.. for j = 1.. 2. . 2. J k=l Step 5: Output layer weights are adjusted: Wkj f. . .Ok)' for k = 1.
... and go to Step 2. . q f. .Vji + T/OyPi' for j = 1. 2. .q + 1.J and i = 1. 2.I Step 7: If p < P then p f.Error BackPropagation Training Algorithm (EBPTA) Step 6: Hidden layer weights are adjusted: Vji f. otherwise. .P + 1. . go to Step 8..
q. . For E < Emax terminate the training session.1. and initiate the new training cycle by going to Step 2.Error BackPropagation Training Algorithm (EBPTA) Step 8: The training cycle is completed. V.0. Output weights W.5). If E > Emax' then E f. patterns should be chosen at random from the training set Qustification follows in Section 4. P f. NOTE 1 For best results. and E.
y) L 8okWk}' for j = 1. K K 8y) = y}(l .0k)(l . 2..4a) is used in Step 2. . then the error signal terms in Step 4 are computed as follows 80k = (dk . J k=l .Error BackPropagation Training Algorithm (EBPTA) 11IIIII NOTE 2 If formula (2.. ..0k)ok' for k = 1. 2. .. .
i=1.. i=1. be the outputs of the network and the energy function E satisfy the following: dE dy i i 0 dt dt i 1 n 2 where i>0 ....2.The Hopfield Network The Hopfield Network We know that the Hopfield Network is a Recurrent (Feedback or Dynamical) Neural Network.n..... Let yi . ....2.n..
yi ( t ) cons tan t i ..e. i. .The Hopfield Network The above inequlity reveals that the energy decreases with time and becomes zero if and only if dyi 0 i . yi ( t ) reach their stable equilibrium states . dt i..e.
The Hopfield Network Now let us assume that df ( yi ) i Ci dyi where 1 Ci 0 .
The Hopfield Network For the bipolar activation function 1 e y f(x) ax 1 e the inverse function is given by: ax 1 1 y x f ( y ) ln a 1 y 1 .
The Hopfield Network The Bipolar Activation Function and its Inverse .
The Hopfield Network The Derivative of the Inverse of the Bipolar Function dx 2 1 2 dy a 1 y .
The Hopfield Network We can conclude that df 1( yi ) 0 dyi for 1 yi 1 Therefore dE dyi df ( yi ) Ci 0 dt dyi dt i 1 n 1 2 .
The Hopfield Network Considering dxi df ( yi ) df ( yi ) dyi dt dt dyi dt 1 1 we obtain n dyi df ( yi ) dxi dyi dE Ci Ci dt dyi dt dt dt i 1 i 1 n 1 2 .
Now defining x1 x 2 x . . yN C diag ( Ci ) yields dE dy d x dx dy C C dt dt dt dt dt t t . xN The Hopfield Network y1 y 2 y . .
.The Hopfield Network Since dy dE t E( y ) dt dt We can write dx E ( y ) C dt This reveals that the capacitor current vector is parallel to the negative gradient vector.
The Hopfield Network y1 y2 . yN wi1 wi2 gi Ii xi Ci N dxi wij ( y j xi ) g i xi I i Ci dt j 1 yi wiN N yi f ( xi ) dxi Ci wij y j wij g i xi I i dt xi f 1( yi ) j 1 j 1 N . . .
.The Hopfield Network Now define w j 1 N ij g i Gi C diag ( Ci ) G diag ( Gi ). N w11 w 12 W . w2 N . . . . . . . ...2. I N . i 1. .. w1N w12 w22 . . wNN x1 x 2 x . w2 N . . w1N . . xN I1 I 2 I .. .
We obtain The Hopfield Network N dxi Ci wij y j Gi xi I i dt j 1 consequently dx( t ) C Wy Gx I dt dx E ( y ) C dt and since .
The Hopfield Network we obtain E( y ) Wy Gx I In the case of bipolar activation function we know that 1 1 y x f ( y ) ln a 1 y 1 .
The Hopfield Network Therefore the state vector is given as: 1 y1 ln 1 y 1 ln 1 y2 1 1 y2 x a 1 yN ln 1 y N .
We already know The Hopfield Network N dE dxi dyi Ci dt dt dt i 1 therefore N N dyi dE ( wij y j Gi xi I i ) dt dt i 1 j 1 and N N dyi N dyi N dyi dE ( wij y j Gi xi Ii ) dt dt i 1 dt i 1 dt i 1 j 1 .
Now consider: The Hopfield Network d dy dy t t Wy y W ( y Wy ) dt dt dt t If then W W t d dy dy t t t ( y Wy ) W y yW dt dt dt t .
The Hopfield Network dy dy dy dy t t t t W y Wy ( y W ) yW dt dt dt dt Therefore t t d dy t t ( y Wy ) 2 y W dt dt and dy 1 d t yW ( y Wy ) dt 2 dt t .
The Hopfield Network Now consider the first term of N N dyi N dyi N dyi dE ( wij y j Gi xi Ii ) dt i 1 dt i 1 dt dt i 1 j 1 We can write: dyi dy t wij y j dt y W dt i 1 j 1 N N Now using the above equality. we have N N dyi 1 d t wij y j dt 2 dt (y Wy) i 1 j 1 .
The Hopfield Network Now consider the second term in the same equation: dyi dyi 1 f ( yi ) xi dt dt we can write d 1 f ( yi ) ( f ( y )dy ) dyi 0 1 yi dyi d dyi d 1 1 f ( yi ) ( f ( y )dy ) ( f ( y )dy ) dt dyi 0 dt dt 0 1 yi yi .
The Hopfield Network N dE d 1 t 1 ( y Wy Gi f ( yi )dy I i yi ) dt dt 2 i 1 i 1 0 N yi N 1 t 1 E ( y Wy Gi f ( yi )dy I i yi ) 2 i 1 i 1 0 N yi .
The Hopfield Network In order to obtain the state equations in terms of the outputs yi consider once again N dxi Ci wij y j Gi xi I i dt j 1 Using dxi 2 1 2 dyi a 1 yi .
The Hopfield Network we obtain dy i 2 Ci wij y j Gi xi I i 2 a (1 y i ) dt j 1 and N dy i dt a (1 y i ) 2 2Ci w j 1 N ij y j Gi xi I i .
The Hopfield Network a( 1 y ) dy 1 Wy GΓ (y) I diag 2C dt i 2 i .
The Hopfield Network g12 g11 g1 x1 C1 dx1 g1 x1 C1 g11 ( y1 x1 )+ g12 ( y2 x1 ) dt y1 g21 g22 g2 x2 C2 y2 dx2 g 2 x2 C2 g 22 ( y2 x2 )+g 21 ( y1 x2 ) dt .
The Hopfield Network C1 0 dx1 0 dt g11 dx g C1 2 21 dt g12 y1 g1 g11 g12 y 0 g 22 2 0 x1 g 2 g 22 g 21 x2 which yields C1 0 g11 C . W g 0 C1 21 g12 g1 g11 g12 . G g 22 0 0 g 2 g 22 g 21 .
and The Hopfield Network g11 y2 g 21 g12 y1 1 y Gi f y dy g 22 2 i1 0 2 yi 1 E y1 2 g11 E ( y1 . y2 ) g 21 g12 y1 1 G1 y a 0 g 22 2 1 y1 ln 0 1 y1 1 y G2 2 ln 1 y 2 .
y2 ) 1 1 y2 E y g 22 y1 g 21 y1 a G2 ln 1 y 2 2 .The Hopfield Network and 1 1 y1 E y g11 y1 g12 y2 a G1 ln 1 y 1 1 E ( y1 .
The Hopfield Network 1 2 1 2 2 1 E ( g11 y1 g 22 y2 y1 y2 g12 y1 y2 g12 ) G1 f y dy G2 f 1 y dy 2 0 0 y y 2 1 1 1 1 y 1 y 2 2 E ( g11 y1 g 22 y2 y1 y2 g 21 y1 y2 g12 ) G1 ln dy G2 ln dy 2 a 0 1 y 1 y 0 y y Now consider 1 y I ln dy ln( 1 y )dy ln( 1 y )dy 1 y 0 0 0 yi yi yi .
The Hopfield Network and I1 ln( 1 y )dy 0 yi Let ln( 1 y ) u 1 y v yi then yi dy du 1 y dv dy yi Hence I1 ln( 1 y )dy udv ( uv ]0yi vdu ) 0 0 0 dy { ln ( 1 y )}( 1 y )] ( 1 y ) 1 y 0 yi 0 yi .
The Hopfield Network I1 { ln ( 1 yi )}( 1 yi ) yi I 2 {ln ( 1 yi )}( 1 yi ) yi 1 y I ln dy ln( 1 y )dy ln( 1 y )dy 1 y 0 0 0 yi yi yi I I1 I 2 { ln ( 1 yi )}( 1 yi ) {ln ( 1 yi )}( 1 yi ) .
The Hopfield Network .
The Hopfield Network 2 1 1 1 1 y 1 1 y 2 2 E ( g11 y1 g 22 y2 y1 y2 g12 y1 y2 g12 ) G1 ln dy G2 ln dy a 0 1 y a 0 1 y 2 y y 1 1 2 E {g11 y12 g 22 y2 (g12 g 21 )y1 y2 } G1 ( 1 y1 ) ln( 1 y1 ) ( 1 y1 ) ln( 1 y1 ) a 2 1 G2 ( 1 y2 ) ln( 1 y2 ) ( 1 y2 ) ln( 1 y2 ) a .
The Hopfield Network .
The Hopfield Network .
The Hopfield Network .
The Hopfield Network .
The Hopfield Network .
Using The Hopfield Network the state equations are obtained as 1 y12 dy1 a dt 2C1 dy2 0 dt a( 1 yi2 ) dy Wy GΓ 1 (y) I diag dt 2Ci 1 1 y1 0 a ln 1 y 1 g11 g12 y1 G1 0 y 0 G 1 1 y 2 1 y2 g 21 g 22 2 2 2 a a ln 1 y 2C2 2 dy1 1 (1 y 2 )(ag y ag y G ln 1 y1 ) 1 11 1 12 2 1 dt 2C1 1 y1 dy2 1 (1 y 2 )(ag y ag y G ln 1 y2 ) 2 22 2 21 1 2 dt 2C2 1 y2 .
DiscreteTime Hopfield Networks Consider the state equation of the GradientType Hopfield Network: We can write dx( t ) C Wy Gx I dt dx( t ) 1 C Wy GΓ (y) I dt .
DiscreteTime Hopfield Networks As the plot of the inverse bipolar activation function shows the second term in the above equation is zero for high gain neurons. Hence: dx( t ) C Wy I dt .
.DiscreteTime Hopfield Networks Now consider dxi df ( yi ) df ( yi ) dyi dt dt dyi dt 1 1 Using the this plot we can conclude that df 1 ( yi ) 0 dyi for high gain neurons.
To this end define: .DiscreteTime Hopfield Networks Hence dx( t ) 0 dt It follows that 0 Wy I Now let us solve this equation using Jacobi’s algorithm.
0 . . . 0 w 21 . . . . . . .U and D = diag( wii ) Are the lower and upper triangular and diagonal matrices shown in the following decomposition of W. . . 0 . . . w1N . . . 0 0 0 . w N2 . . . . . 0 . w N1.N . . . . . w NN 0 0 w 22 . . .. W w11 w 21 . w 31 w 32 . . . w NN w N1 w N2 . . . . . . . . . w 2N . w1N w11 . . .DiscreteTime Hopfield Networks where W' W D = L U L. w 2N 0 . . 0 0 0 . . 0 . .N1 0 0 w12 0 0 0 . w N. w N1 w12 w 22 . .
DiscreteTime Hopfield Networks Now defining we obtain D diag( wii ) 1 1 Dy = W'y + I y = D W'y .D I Now define D W' = W 1 1 D I I .
If not then call the vector y obtained on the lefthand side y(1).DiscreteTime Hopfield Networks y = Wy I Now replace the vector y on the righthand side by an initial y(0) vector. i. If the vector y on the lefthand side is obtained as y(0). then y(0) is the solution of the system..e. .
DiscreteTime Hopfield Networks y(1) = W y(0) I and in general we can write y(k + 1) = W y(k) I .
Strict row diagonal dominance means that for each row.DiscreteTime Hopfield Networks The method will always converge if the matrix W is strictly or irreducibly diagonally dominant. the absolute value of the diagonal term is greater than the sum of absolute values of other terms: wii wij i j .
It is necessary.DiscreteTime Hopfield Networks The Jacobi method sometimes converges even if this condition is not satisfied. . however. that the diagonal terms in the matrix are greater (in magnitude) than the other terms.
i. the updating can be made individually for each unknown and this updated value can be used in the next equation. This is shown in the following equations: ..DiscreteTime Hopfield Networks Solution by GaussSeidel Method In Jacobi’s method the updating of the unknowns is made after all N unknowns have been moved to the left side of the equation.e. We will see in the following that this is not necessary.
.. a2 N xN (n) b2 a22 a22 x3 ( n 1) and 1 a31x1 (n 1) a32 x2 (n 1) a34 x2 (n) . aN . a1N xN (n) b1 ] a11 1 a21x1 (n 1) a23 x3 (n) .... we can write: D x (n 1) L x (n 1) U x (n) b ( D L) x (n 1) U x (n) b x (n 1) ( D L) 1 (U x (n) b) ............... a3 N xN (n) b3 a33 a33 xN (n 1) 1 aN 1 x1 ( n 1) aN 2 x2 ( n 1) .DiscreteTime Hopfield Networks x1 (n 1) x2 (n 1) 1 [ a12 x2 (n) a13 x3 (n) . N 1 xN 1 ( n 1) bN aNN aNN In vectormatrix form.
DiscreteTime Hopfield Networks This matrix expression is mainly used to analyze the method. When implementing Gauss Seidel. an explicit entrybyentry approach is used: 1 xi (n 1) aii bi aij x j (n 1) aij x j (n) j i j i .
DiscreteTime Hopfield Networks GaussSeidel method is defined on matrices with nonzero diagonals. . symmetric and positive definite. diagonally dominant or 2. but convergence is only guaranteed if the matrix is either: 1.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.