You are on page 1of 33

Soft Computing

Dynamic Neural Networks


Dynamic Neural Networks
• Neural network structures can be generally classified into feedforward or
recurrent architecture.
• Feedforward networks are networks in which the output of a given node
is always injected into the following layer.
• Recurrent networks, on the other hand, have some outputs directed back as
inputs to node(s) in the same or preceding layer.
• The presence of cycles in a recurrent network provides it with the abilities of
dynamically encoding, storing, and retrieving context information, hence
their name as dynamic neural networks.
• Much like a sequential circuit, the state of the network at a given instant
depends on the state at the previous instant.
• While possessing several attractive capabilities, recurrent neural
networks remain among the most challenging networks to deal with,
given the variety of their structures and the sometimes complex and
computational intensive training process.
• The architecture is similar to the feedforward network except that
there are connections going from nodes in the output layer to the
ones in the input layer.
• There are also some self-loops in the hidden layer.
• The “feedback” connections are specific to these networks and
allow them to tackle problems involving dynamic processes and
spatio-temporal patterns.
Recurrent neural network architecture
• The networks differ from feedback
network architectures in the sense
that there is at least one “feedback
loop”.

• Thus, in these networks, there could


exist one layer with feedback
connection.

• There could also be neurons with


self-feedback links, that is, the output
of a neuron is fed back into itself as
input.
Recurrent neural network architecture

• For an arbitrary unit in a recurrent network, the activation at a given time


t is given as
• The operation of a recurrent node can be summarized as follows.
• When time inputs are presented at the input layer, the unit
computes the activation output just like a feedforward node does.
However, its net input consists of additional information
reflecting the state of the network.
• For the subsequent inputs, the state will essentially be a function
of the previous and current inputs.
• As a result, the behaviour of the network depends not only on the
current excitation, but also on past history.
• The wealth of information that can be stored in a recurrent
network can prove to be very useful.
• Recurrent neural networks can be classified into two categories pertaining to
their weight connections.
• The first group has symmetrical weight connections.
• Hopfield networks are an example of this category

Symmetrical synaptic connections, i.e., wij = wji.

The Hopfield neural network


• The second category, is characterized by asymmetrical weight connections.
• The two major types of recurrent networks in this category are partially recurrent networks and fully
recurrent networks.
• A partially recurrent network is basically a multilayer feedforward network, but with feedback links
coming from either the hidden-layer nodes or the output nodes. The layer to which the feedback links
are connected is an artificially added layer called the context layer.
• The two well-known example of simple recurrent network are the Elman network and the Jordan’s
sequential network.

Architecture of a Elman network Architecture of Jordan’s sequential network


The Hopfield Network

• It was the pioneering work of Hopfield in the early 1980s that led the way
for designing neural networks with feedback paths.
• The work of Hopfield is seen by many as the starting point for the
implementation of associative memory by using a special structure of
recurrent neural networks.
• The associative memory concept means simply that the network is able to
recognize newly presented (noisy or incomplete) patterns using an already
stored “complete” version of that pattern.
• The new pattern is “attracted” to the stable pattern already stored in the
network memories.
Topology of The Hopfield Network
• A number of processing units configured in one single layer (besides
the input and the output layers) with symmetrical synaptic
connections, i.e., wij = wji.
• In the original work of Hopfield, the output of each unit can take a
binary value (either 0 or 1) or a bipolar value (either −1 or 1).
• This value is fed back to all the input units of the network except to
the one corresponding to that output.
• By assuming the state of the network with dimension n (n neurons)
takes bipolar values.
• The activation rule for each neuron is then provided by the following:
• This value is fed back to all the input
units of the network except to the one
corresponding to that output.

Topology of Hopfield Network


• To accomplish the auto-associative behaviour of the network according
to Hopfield physical view of a dynamical system, He used an energy
function for the network given by:

• To ensure stability of the network, E is defined in such a way as to


decrease monotonically with variation of the output states until a
minimum is attained.

• Above expression shows that the energy function E of the network


continues to decrease until it settles by reaching a local minimum.
• This also translates into a monotonically decreasing behaviour of the energy.
Learning Algorithm

• The learning algorithm for the Hopfield network is based on the Hebbian
learning rule.
• This is one of the earliest procedures for carrying out unsupervised learning.
• The Hebbian learning rule, also known as the outer product rule of storage,
as applied to a set of q presented patterns pk(k = 1, . . . , q) each with
dimension n (n denotes the number of neuron units in the Hopfield network),
is expressed as:
Learning Algorithm (cont.)

• The weight matrix W = {wij} could also be expressed in terms of the


outer product of the vector pk as:

• The ratio (1/n) is used for computational convenience.


• The different learning stages of a Hopfield network are summarized as
follows:
• Step 1 (storage): The first stage in the learning algorithm is to store the
patterns through establishing the connection weights.
• Each of the q patterns presented is a vector of bipolar elements (+1 or −1).
These patterns are also called fundamental memories, given that the
connection weights have now become a fixed entity within the network.
• This reflects the stable status of the network to which newly presented
patterns should get “attracted.”
• Step 2 (initialization): The second stage is initialization and it consists in
presenting to the network an unknown pattern u with the same dimension as
the fundamental patterns. Every component of the network outputs at the
initial iteration cycle is set as
o(0) = u
• Step 3 (retrieval 1): Each one of the component oi of the output vector
o is updated from cycle l to cycle l + 1 according to the iteration
formulae:

• This process is known as asynchronous updating (where a single neuron


is selected randomly and is activated according to above equation.
• The process continues until no more changes are made and convergence
occurs. At this stage the iteration is stopped and the output vector
obtained matches best the fundamental pattern closest to it.

• Step 4 (retrieval 2): Continue the process for other presented unknown
patterns by starting again from Step 2.
Example
We need to store a fundamental pattern (memory) given by the vector B =
[1 1 1 −1]T in a four-node binary Hopfield network. We will show that all
possible 16 states of the network will be attracted to the fundamental
memory given by B or to its complement L = [−1 −1 −1 1]T. Let us
presume that all threshold parameters θi are equal to zero. The weight
matrix is given by below equation, where the ratio (1/n = 1/4) was
discarded:
• Using the energy equation we compute the energy value for every state (each
state is coded with entries oi, where oi = 1 or −1 for each i = 1, 2, 3, 4).
• Looking at the energy levels, we find that two potential attractors
emerge: the original fundamental pattern coded by [1, 1, 1, −1]T and
its complement given by [−1, −1, −1, 1]T. Both of them have the
lowest energy value given by “−6”.
• At the retrieval stage, we use

• In this example we presume that all θi = 0, and we update the state


asynchronously using above equation.
• Updating the state asynchronously means that for every state presented we
activate one neuron at a time, which leads to the state transitions in Table.
• We notice that all states either remain at their current energy level or move to
lower energy levels as shown in Table and Figure below.
• Let us analyse briefly some of the transitions:
• (1) When the input state is [1, 1, 1, 1]T, which is the state A with energy “0”,
we find that only the fourth transition contributes to a change in status. As
such, the state A transits to [1, 1, 1, −1]T, which is the state B, representing
the fundamental pattern with the lowest energy value “−6”.
• (2) When the input state is [1, 1, −1, 1]T, which is state D with energy value
“2”, it could transit a few times to end up at state L after being updated
asynchronously. In fact we find that all of its four bits o1, o2, o3, o4 can be
updated. Updating the first bit o1, the state becomes [−1, 1, −1, 1]T, which is
the state M with energy level “0”. Updating bit o2, the state transits to [1,
−1, −1, 1]T, which is the state L having the energy level “−6”. Updating bit
o3, the state remains at [−1, −1, −1, 1]T. Updating bit o4, the state remains
unchanged as L. Throughout this process, the energy has decreased from
level “2” to level “0” to level “−6”.
• (3) When the input state is the fundamental pattern B represented by
[1, 1, 1, −1]T, there is no change of the energy level and no transition
occurs to any other state. It is in its stable state because this state has
the lowest energy.
• (4) When the input state is [−1, −1, −1, 1]T, which happens to be the
complement of the fundamental pattern B, given by the state L, we
find that its energy level is the same as B and hence it is another stable
state.
• This is usually the case: every complement of a fundamental pattern is
a fundamental pattern itself. This is due to the fact that the Hamming
distance is the same for both the fundamental pattern and its
complement.
• This means that the Hopfield network has the ability to remember the
fundamental memory and its complement.
Applications of Hopfield Networks
• Given their auto-associative memories capabilities, Hopfield networks
have been used extensively for information retrieval and for pattern and
speech recognition.

• They have been used as well to solve optimization problems.

• In fact, by making the energy function of a Hopfield network equivalent


to a certain performance index or an objective function of a given process
to be minimized, the configuration at which the network finally settles
indeed represents the solution to the original optimization problem.

• The Hopfield network was also used for solving combinatorial


optimization problems such as the travelling salesman problem.
Limitations of Hopfield Networks

• As with other types of networks, the Hopfield network, despite many


of its features and retrieval capabilities, has its own limitations.

• They are mostly related to the limited stable-state storage capacity of


the network and its possible convergence to an incorrect stable state
when non-fundamental patterns are presented.

• Hopfield estimated roughly that a network with n processing units


should allow for 0.15n stable states. Many studies have been carried
out recently to increase the capacity of the network without increasing
much the number of processing units.
• The search for the most effective training algorithm for recurrent
networks is still an ongoing research problem.

• This is particularly challenging due to the architectural diversity of


recurrent networks.

• As a result, numerous training approaches have been proposed and


studied in the literature in the last decade and the majority of them
have discussed ways to speed up the training process.

• They often find practical applications in areas such as nonlinear


modelling and control, pattern and sequence processing based on their
structural and convergence characteristics.

You might also like