Neural Associative Memories

Neural associative memories (NAM) are neural network models consisting of neuronlike and synapse-like elements. At any given point in time the state of the neural network
is given by the vector of neural activities, it is called the activity pattern. Neurons update
their activity values based on the inputs they receive (over the synapses). In the simplest
neural network models the input-output function of a neuron is the identitiy function or a
threshold operation. Note that in the latter case the neural activity state is binary: active or
inactive. Information to be processed by the neural network is represented by activity
patterns (for instance, the representation of a tree can an activity pattern where active
neurons display a tree's picture). Thus, activity patterns are the representations of the
elements processed in the network. A representation is called sparse if the ratio between
active and inactive neurons is small.
The synapses in a neural network are the links between neurons or between neurons
and fibers carrying external input. A synapse transmits presynaptic activity to the
postsynaptic site. In the simplest neural network models each synapse has a weight value
and the postsynaptic signal is simply the product of presynaptic activity and weight. The
input of a neuron is then the sum of the postsynaptic signals of all synapses connecting
the neuron. Thus, information is processed in a neural network by activity spread. How
activity spreads, and by this, which algorithm is implemented in the network depends on
how the synaptic structure, the matrix of synaptic weights in the network is shaped by
learning. In neural associative memories the learning provides the storage of a (large) set
of activity patterns during learning, the memory patterns.
Different memory functions are defined by the way how learned patterns can be
selectively accessed by an input pattern. The function of pattern recognition means to
classify input patterns in two classes, familiar patterns and the rest. Pattern association
describes the function to associate certain input patterns with certain memory patterns,
i.e., each stored memory consists of a pair of input and desired output pattern.
The advantage of neural associative memories over other pattern storage algorithms
like lookup tables of hash codes is that the memory access can be fault tolerant with
respect to variation of the input pattern. For pattern association this means that an output
pattern can be produced for a set of input patterns that are (with respect to a metric like
the Hamming distance) closest to the input pattern presented during learning. Fault
tolerance is key in the function of content-addressed pattern retrieval or pattern
completion. Here the pattern pairs used during learning have to consist of two identical
patterns which one also calls autoassociative learning. In this case if the input patterns are
noisy versions of the learned patterns, the pattern association can be seen as pattern
completion, i.e., the noisy version is replaced by the noiseless version of the pattern.

One discerns two different retrieval mechanisms corresponding to feed-forward or
feed-back architectures of the neural network: In one-step retrieval each network neuron
evaluates its input just once. In iterative retrieval activity flows through feedback
connections and neurons evaluate their inputs several times during the retrieval process.

There are different models of associative memory with respect to the learning procedure. . Associative Memories Description A content-addressable memory is a type of memory that allows for the recall of data based on the degree of similarity between the input pattern and the patterns stored in memory. It can either be one-shot where each learning pattern is presented once or recurrent. The simplest form is a local learning rule where synaptic weight change depends only on pre. this type of memory allows the recall of information based on partial knowledge of its contents.. for instance employs a clipped superposition (resulting in binary weights) that is nonlinear. The performance of neural associative memories is usually measured by a quantity called information capacity. that is. In associative memories many associations can be stored at the same time. It refers to a memory organization in which the memory is accessed by its content as opposed to an explicit address like in the traditional computer memory system. In recurrent training the pattern retrieval is checked and the patterns are presented multiple times until a desired level of retrieval quality is achieved. i. using the erroneous string "Crhistpher Columbos" as key is sufficient to retrieve the correct name "Christopher Colombus. the information content that can be learned and retrieved.e. The Willshaw model. this type of memory is robust and fault-tolerant. The superposition can be simple linear addition of the synaptic changes required for each association (like in the Hopfield model) or nonlinear. There are different schemes of superposition of the memory traces formed by the different associations. the signals locally available at the synapse. Suppose we are given a memory of names of several people as shown in the figure below. If the given memory is content-addressable.and postsynaptic activity." In this sense.Training or learning in the network means that neural activity patterns can influence the synaptic structure. Therefore. devided by the number of synapses required. as this type of memory exhibits some form of error-correction capability. The rules governing this influence are called synaptic learning rules.

i. X ¹Y. XÎ {-1.A content-addressable memory in action An associative memory is a content-addressable structure that maps specific input representations to specific output representations. Artificial neural networks can be used as associative memories. i. different from the input pattern not only in content but possibly also different in type and format. Y Î {-1.e. X = Y. Y) such that when one is encountered. +1}n and m and n are the length of vectors X and Y. The Hopfield model and bidirectional associative memory (BAM) models are some of the other popular artificial neural network models used as associative memories. It is a system that “associates” two patterns (X. Below is the network architecture of the linear associator.. +1}m. the other can be recalled.. The components of the vectors can be thought of as pixels when the two patterns are considered as bitmap images. One of the simplest artificial neural associative memory is the linear associator. the retrieved pattern is. in general. There are two classes of associative memory: autoassociative and heteroassociative. On the other hand. respectively. Associative Memories Linear Associator The linear associator is one of the simplest and first studied associative memory model. Typically. . An autoassociative memory is used to retrieve a previously stored pattern that most closely resembles the current pattern. in a heteroassociative memory.e.

. the network can be used for retrieval.e. p} where Xk Î {-1. The process of retrieving a stored pattern given an input pattern is called decoding. 2. During encoding the weight values of the correlation matrix Wk for a particular associated pattern pair (Xk.Linear Associator It is a feedforward type network where the output is produced in a single feedforward computation.. Constructing the connection weight matrix W is then accomplished by summing up the individual correlation matrices. 2. After encoding or memorization. 2. m and j = 1... It is the connection weight matrix that stores the p different associated pattern pairs {(Xk.. (yj)k represents the jth component of pattern Yk for i = 1. In the figure. Building an associative memory is nothing but constructing the connection weight matrix W such that when an input pattern is presented. the stored pattern associated with the input pattern is retrieved. Given a .... ... +1}n in a distributed representation. i. The process of constructing the connection weight matrix is called encoding. The connection weight matrix construction above simultaneouly stores or remembers p different associated pattern pairs in a distributed manner. a is usually set to 1/p to prevent the synaptic values from going too large when there are a number of associated pattern pairs to be memorized. Yk) are computed as: (wij)k = (xi)k(yj)k where (xi)k represents the ith component of pattern Xk. Yk) | k = 1.. all the m input units are connected to all the n output units via the connection weight matrix W = [wij]m x n where wij denotes the synaptic strength of the unidirectional connection from the ith input unit to the jth output unit. n. . where a is the proportionality or normalizing constant. +1}m and Yk Î {-1.

. k and b = 1.. From the foregoing discussions. 1991. 1996. 1993. Patterson.e. Memory capacity refers to the maximum number of associated pattern pairs that can be stored and correctly retrieved while contentaddressability is the ability of the network to retrieve the correct stored pattern. However. If the stored input patterns are not mutually orthogonal. 1992). the input pattern may contain errors and noise.stimulus input pattern X. Traditional measures of associative memory performance are its memory capacity and content-addressability. the linear associator (and associative memories in general) is robust and fault tolerant. it can be seen that the output units behave like the linear threshold units (McCulloch and Pitts.. i. 1943) and the perceptrons(Rosenblatt. 2... i. or may be an incomplete version of some previously encoded pattern.. several modifications and variants have been proposed to maximize the memory capacity.. non-perfect retrieval can happen due crosstalk among the patterns. when presented with such a corrupted input pattern. decoding or recollection is accomplished by computing the net input to the output units using: where inputj stands for the weighted sum of the input or activation value of node j for j = 1. 1958) that compute a weighted sum of the input and produces a -1 or +1 depending whether the weighted sum is below or above a certain threshold value. It is known that using Hebb's learning rule in building the connection weight matrix of an associative memory yields a significantly low memory capacity. Then determine the output of those units using the bipolar output function: where qj is the threshold value of output neuron j. . Ritter. Due to the limitation brought about by using Hebb's learning rule. Perfect retrieval is possible if the input patterns are mutually orthogonal. Some examples can be found in (Hassoun. . Therefore. n. k. 2. even if the input patterns are not mutually orthogonal. 2. ..e. Hertz. the presence of noise or errors results only in a mere decrease rather than total degradation in the performance of the network.. accurate retrieval . However. Obviously.. the two performance measures are related to each other. the network will retrieve the stored pattern that is closest to actual input pattern. Associative memories being robust and fault tolerant are the byproducts of having a number of processing elements performing highly parallel and distributed computations.. Nevertheless. for a = 1.

p} where c stands for the complement of a vector. The degree of correlation among the input patterns sets the limit on the memory capacity and content-addressability of an associative memory. a heteroassociative memory retrieves the stored pattern Y given an input pattern X such that Y ¹X. The popular Hopfield Model is of this type of network. where each node is connected to every other node in the network. . p} results in the storage of the associated pattern pairs {(Xkc. 2. i..e. i. . On the other hand.. . and associated Another implementation of associative memory that differ significantly from the one discussed so far is the recurrent type networks where the output of the units are fed back to the input units. A heteroassociative memory is a generalization of an autoassociative memory with connection weight matrix equivalent to pattern pairs {(Xk | Yk. Xk) | k = 1... Ykc) | k = 1.e. p}.. Hopfield Model The Hopfield model was proposed by John Hopfield of the California Institute of Technology during the early 1980s. Recurrent type networks produce their output through recursive computations until they become stable. Another characteristic of associative memory is that the storage of the associated pattern pairs {(Xk. p}. 2.... He showed how an ensemble of simple processing units can have fairly complex collective computational abilities and behavior. An autoassociative memory retrieves the same pattern Y given an input pattern X. Y = X. Yk) | k = 1. The publication of his work in 1982 significantly contributed to the renewed interest in research in artificial neural networks. . 2... Xkc = -Xk.. 2. . The difference between autoassociative and heteroassociative memories lies in the retrieved pattern.e. Presence of spurious memories degrades the content-addressability feature of an associative memory... The dynamics of the Hopfield model is different from that of the linear associator model in that it computes its output recursively in time until the system becomes stable. meaningless memories or associated pattern pairs other than the original fundamental memories stored.can still be realized if the crosstalk term is small.. Xk | Yk) | k = 1. Associative memory can be autoassociative or heteroassociative. i. An autoassociative memory stores the associated pattern pairs {(Xk. Below is a Hopfield model with six units.. Another limitation of associative memories is the presence of spurious memories.

The pattern X' is then fed back to the units as an input pattern to produce the pattern X''. The pattern X'' is again fed back to the units to produce the pattern X'''. decoding is accomplished by computing the net input to the units and determining the output of those units using the output function to produce the pattern X'.. After encoding.. m..Hopfield model Unlike the linear associator model which consists of two layers of processing units.. m. i. . Decoding in the Hopfield model is achieved by a collective and recursive relaxation search for a stored pattern given an initial stimulus pattern.e. patterns. rather than associated pattern pairs. a single associated pattern pair is stored by computing the weight matrix as follows: Wk = XkTYk where Yk = Xk to store p different associated pattern pairs. The connection weight matrix W of this type of network is square and symmetric. the network can be used for decoding. one serving as the input layer while the other as the output layer. . 2. Unlike the linear associator.. This extra input leads to a modification in the computation of the net input to the units: for j = 1. But just like the linear associator. j = 1. . Since the Hopfield model is an autoassociative memory model. the units in the Hopfield model act as both input and output units. The process is repeated until the network stabilizes on a stored pattern where further computations do not change the output of the units. are stored in memory. Given an input pattern X. Each unit has an extra external input Ii. the Hopfield model consists of a single layer of processing elements where each unit is connected to every other unit in the network other than itself. 2... wij = wji for i.

random or sequential) and the output are then fed back to the network after each unit update. Using the synchronous updating scheme. In spite of this seemingly limited memory capacity of the Hopfield model. the output of the units remain the same if the current state is equal to some threshold value: for i = 1. Dai. using the asynchronous updating scheme. Soper.. McEliece. The memory capacity of the Hopfield model can be increased as shown by Andrecut (1972). the energy function E according to Hopfield (1982) is defined as: . 2. For the discrete Hopfield model with wii = 0 and wij = wji using the asynchronous updating scheme. deMenezes. Hopfield (1982) demonstrated that the maximum number of patterns that can be stored in the Hopfield model of m nodes before the error in the retrieved pattern becomes severe is around 0. Suh. The energy function is used to prove the stability of recurrent type networks. Discussions on the application of the Hopfield model to combinatorial optimization problems can be found in (Siu.If the input pattern X is an incomplete pattern or if it contains some distortions. The updating schemes are synchronous (or parallel as termed in some literatures). Using the hybrid synchronous-asynchronous updating scheme. the output of the units are updated as a group prior to feeding the output back to the network. or a combination of the two (hybrid). Poli. the units use a slightly modified bipolar output function where the states of the units. During decoding. subgroups of units are updated synchronously while units in each subgroup updated asynchronously. An interesting property of recurrent type networks is that their state can be described by an energy function.g. there are several schemes that can be used to update the output of the units. On the other hand. This feature is called pattern completion and is very useful in many image processing applications. The choice of the updating scheme has an effect on the convergence of the network. i.. Hopfield. several applications can be listed (Chaudhuri. Tsai)..e. Ritter. Garcia.15m. . Various widely available text and technical papers vy Hassoun. Discrete Hopfield Model In the discrete Hopfield model. the output of the units are updated in some order (e. and Volk can be consulted for a mathematical discussion on the memory capacity of the Hopfield model. Hertz (et al). asynchronous (or sequential).. Suh). the stored pattern to which the network stabilizes is typically one that is most similar to X without the distortions. m and where t denotes the discrete time.

The energy of the discrete Hopfield model is bounded from below by: for all Xk.. . In the continuous Hopfield model. Hopfield (1982) has shown that the energy of the discrete Hopfield model decreases or remains the same after each unit update. the network will eventually converge to a local minimum corresponding to a stored pattern. the units use a continuous output function such as the sigmoid or hyperbolic tangent function. . Therefore. respectively. Continuous Hopfield Model The continuous Hopfield model is just a generalization of the discrete case. p. k = 1.where the local minima of the energy function correspond to the energy of the stored patterns. The energy function due to Hopfield (1984) is given by: where f is the output function of the units. each unit has an associated capacitor Ci and resistance ri that model the capacitance and resistance of real neuron's cell membrane. Since the energy is bounded from below. there is an energy function characterizing the continuous Hopfield model. 2. the network will eventually converge to a local minimum that corresponds to a stored pattern. Thus the state equation of each unit is now: where Just like in the discrete case.. The stored pattern to which the network converges depends on the input pattern and the connection weight matrix. Here..

Therefore.It can be shown that dE/dt £ 0 when wij = wji. . The minimum energy of the continuous Hopfield model also exists using analogous computation as that in the discrete Hopfield model. the energy of the continuous Hopfield model decreases or remains the same.