You are on page 1of 18

ADAM: AN ASSOCIATIVE NEURAL ARCHITECTURE FOR INVARIANT PATTERN CLASSIFICATION.

James Austin Department of Computer Science, University of York, UK ABSTRACT A new method for rotation and scale invariant pattern recognition is described. The technique is based upon a simple neural network model allowing massively parallel processing. The method forms an extension to the ADAM system, which is a neural network based associative memory. The basics of the ADAM associative memory are described along with the extension which provides an invariant capability. Introduction Many neural network methodologies have been developed, most of which can be seen as pattern recognition systems. One of the fundamental problems in nearly all neural networks of this type is their inability to inherently exhibit transformation independent recognition. To be precise, this is the ability to invariantly recognise a pattern over a number of transformations, when it has only been previously taught into a network at one position. Typically these transformations are rotations, scaling and translations of a pattern. However, other non-linear transformations of patterns may be considered. To achieve this the networks are either: 1) Specically designed to cope with set transformations. 2) Taught to be invariant. In case 1, hardwiring of neural network connections may result in invariant ability. However, the ability to recognise patterns which have been translated by a non pre-determined transformation is not possible. To cope with a wide variety of transformations a solution often chosen is to train the network on example patterns undergoing the particular transformation. After teaching the aim is to produce a network which will recognise any pattern undergoing this transformation when only trained on one view of the pattern. This approach has been taken by many workers such as Bienenstock and also by Wilson. Hinton, for example, has proposed a
From : First IEE International Conference on Articial Neural Networks. IEE Conference Publication 313.

multi layer network which used the back propagation adaption rule to learn shift invarience in a 12 bit 1D image. However, his particular approach and others which are similar suffer from a number of fundamental problems. a) Time to learn invariance. Allowing a network to adapt using a back propagation type learning algorithm is typically very slow. (15000 itterations of the training set for Hintons example) b) Indeterminate success Only limited success is achieved, even after large amounts of training. c) Lack of predictive ability Typically these networks suffer indeterminate behaviour. It is not possible to predict if and how long learning would take, and how many neural units are needed. The features above are typical of systems which utilize the learning methods based on gradient descent learning. These issues are fundamental in producing a practical, invariant system capable of learning a specic invariance. The method described here approaches the problem from both of the methods (1) and (2) outlined above, overcoming all the limitations in the process. The aim of the work was to produce a method that could be set up to cope with a transformation in a 2D plane and be implemented in dedicated hardware to operate at high speed. Structure of the Method To achieve high speed and simple implementation, the approach uses a simple adaption rule and only binary weights. To obtain effective functioning it also incorporates a novel coding method N point coding and an optimized front end processor. The N point coding method was rst exploited in the original ADAM system where it was used to optimize storage. The overall structure of the present implementation is given in Fig 1.

Fig 1

The system operates by inputing a grey scale 2 dimensional image to recognise, from this it produces on its output: a class description which identies what the shape in the image is; an indication of what transformation the object in the image has gone through to be mapped optimally on to the stored prototype; and a condence gure indicating how well the input object maps onto the stored prototype. Fig 1 shows the three main processing stages; input pre-processing; a transformational sub-system; and nally the origninal ADAM system. All units have been specically designed to interface and operate with each other in an optimal manner. A companion report has already described in detail the theoretical basis of the method, including derivation of formulae for describing the behaviour of the transformational sub-system. This report concentrates on different aspects of the system, particularly how the transformational sub-system interacts with the ADAM memory, and how the system may be visualized as a massively parallel neural network implementation. The paper rst describes the ADAM memory system, and then moves on to show how the transformational sub-system, in conjunction with the front end sampling, has enabled transformational invariance. The ADAM system The Advanced Distributed Associative Memory (ADAM) was designed to recall complete descriptions of patterns from noisy and cluttered examples. Furthermore, it was aimed at a simple high speed implementation, possible in parallel hardware. A restriction of the system has been its inability to recognise patterns invariently. A diagrammatic description of the network that ADAM is based upon is given in Fig 2. In simple terms it is a correlation matrix memory. However, it is well known that these memories suffer from low storage capacity. To overcome this limitation ADAM incorporates 1) A front end processor, which aims to partially orthogonalise the input patterns, and 2) The N point coding method for efcient storage and effective recall. At rst sight the network appears to be a standard 3 layer network with two layers of weights and one layer of hidden units. The major difference in ADAM is that the middle layer is not totally hidden from the teacher. These middle layer units are under the direct inuence of the training process, such that a class pattern is forced onto the units during teaching. All other multi-layer adaption rules for multi-layer networks (MLN) do not

Input Pattern Front End Processor Learning Control (Class Pattern)

Output pattern

Fig 2 A neural implementation of the ADAM memory allow direct access to the middle layer units of a network. ADAM, by dropping this restriction, has achieved very fast learning rates using a simple learning rule. To shorten the description initially just consider the rst two layers of units, i.e. the input layer and the middle class pattern layer.

Input Pattern N tuple pre-processor Class pattern

Correlation matrix

Weight set

Fig 3 A Scematic of the rst stage of the ADAM system Fig 3 shows a description of the rst two layers in more detail, with the weights clearly shown. The rst stage of the process is to take the input image and transform it using some non-linear function. In ADAM N tuple coding is used. This type of coding takes a tuple of elements from the input pattern and assigns a state to the tuple depending on the values of the tuple elements. The simplest state assignment function is the binary N tuple function, shown in table 1. This assigns one of 2 N states to a binary tuple with N elements. Grey scale N tupling may also be used on continuous pixel data. The sampling of the input pattern is typically random and non-

repeating. The output of each decoder feeds the inputs of a correlation matrix.

Tuple elements

State

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 2 3 4 5 6 7

Table 1 Binary N tuple function for N = 3

The N tuple front end processing has a similar effect as the use of the higher order terms used in functional link networks. As Austin and Pao have shown, the addition of higher order terms into the front end of a MLN results in greatly reduced learning times and allows simpler learning rules. In Paos case, the perceptron learning procedure was seen as effective in reducing learning times when used in combination with higher order input terms (compared to a pure perceptron system). In the present case the higher order terms have been implemented using N point sampling and N tuple coding, and a very simple learning rule used. However, as Austin has shown, the use of N tuple sampling with MLNs incorporating back propagation can also substantially increase learning times. The use of this simple input preprocessing and the following learning rule greatly simplies the systems implementation in software as well as hardware. The learning function used in the correlation matrix in Fig 3 is as follows. If I is the input pattern after N point processing, C is the class pattern placed on the output units of the correlation matrix and X ji is one link in the correlation matrix, the adaption rule is as follows; If I i = C j then t ji = 1 else t ji = X t 1 for all C i and I i ji

For recognition the operation is : PCR j =


allj

X ji

. Ii

For all j, where PCR = Pattern Class Response. The selection of C during teaching is designed to allow simple recovery of C during recognition from the PCR. Other correlation matrix methods have always failed because of the difculty in thresholding the output of PCR to recover the original taught pattern. This is achieved in ADAM by using class patterns which are binary and made up of a set number (N) bits set to 1. For each new class pattern taught a new class pattern is generated with N points set to one. By using this type of class pattern, recovery of the class pattern from PCR during recognition is simple. All that is required is to select the N highest elements from PCR and set them to 1, all other elements are set to zero. The result of this N point thresholding process is the class pattern that was used in teaching the original pattern. It is interesting to consider the nal stage of the network shown in Fig 2 at this point. In the ADAM memory this nal stage allows complete recall of the non-corrupted pattern from the class pattern which is recovered as explained above. This stage of the memory is a simple correlation matrix which has been taught, in the same way as the rst stage, to recall the complete pattern by application of the N point thresholded PCR. N point thresholding is not used on the output of this network because the number of elements set to one in the stored image is variable. In this case, because we can assume that the class pattern has been correctly recovered, the raw output can be thresholded in the normal way at N, where N isthe number of bits set to one in the class. This illustrates how the two stages of correlation matrix are tightly coupled in the ADAM system. It is important to realise that the class pattern is a code that is used to identify the input pattern. If the class pattern has k elements and N bits set to 1, the number of codes that can be represented is ; = k! N!(k N) !

Thus, if the storage matrix which associates the input pattern to the class is p x k in size (where p is the number of elements set to one in the input pattern) an absolute maximum of patterns can be coded for storage. The storage properties of ADAM are described in detail in Austin where it was

shown that the exact relationship between the number of patterns that can be stored before error, and the size of the memory is given by ; ln 1 T = ln a Where T = Number of patterns that can be associated before a hamming error of 1 occurs in recall. R = Input pattern size. H = Class pattern size. I = No of bits set to one in each class pattern. N = Number of bits set to one in each input pattern. The way in which the N point class and the N point thresholding effectively allow storage of many patterns has been exploited in the new transformational sub-system. This sub-system pre-processes the input image allowing ADAM to recognise objects not in the same position nor of the same size as those taught. The Transformational Sub-system As explained above the details of this process are given in Austin. The following describes the basics of the method, how it relates to neural networks and how it interacts closely with ADAM. This description explains the method used for rotation and scalar independent recognition, but it can also be used to cope with any transformation. The approach of the method is basically very simple. Consider Fig 4, which illustrates conceptually how the method works. Any input pattern to be recognized is rotated and scaled incrementally to produce a set of transformed patterns (TPs). Each TP is then checked, using a matcher, to see how well it matches with any of the stored patterns. The best match will be indicated at the output of the matcher. As given here, this approach is very slow, and uses a large amount of memory to store all the TPs. As mentioned above, the present approach optimises this method. It should be clear that the present method uses ADAM as the matcher shown in Fig 4, with the templates shown stored in ADAMs correlation 1 H 1/I N.I H.R

Input pattern

T1 T2

Templates Matcher Result

Tn Transformation Fig 4 The basic approach of the method used matrix. To use this approach to achieve efcient invariant recognition two problems had to be overcome. First, the input image needed to be rapidly transformed (by a given rotation and scaling) and, secondly, fast matching of the (large number of) TPs with the templates stored in ADAM was required. First consider how the transformational sub-system tackled the second problem. If the number of TPs is reduced then ADAM needs to do less matching, thus speeding up the process. This is achieved by coding the TPs into a smaller set of images. To do this the N point coding method was used in the same way as in the ADAM memory. After coding, the reduced set of images is passed through ADAM for recognition. It is important to note that no decoding takes place after the reduction. Recognition is only performed on the encoded patterns. To understand the process, it is simpler to consider it conceptually. A neural implementation will then show how the method can be implemented in a massively parallel way. Considering Fig 4 again, each TP is rst given an identication label, called a transformation class (tc), which indicates the scaling and rotation that the input pattern has gone through to achieve the particular TP. The tc is an N point pattern. That is, it is made up of a set number of bits set to one (N) in a eld of r bits (N being constant for all tc patterns). Each tc must be different, but the N bits are randomly distributed over the r elements of tc. In this discussion the assumption is made that the TPs are all binary patterns. Once the assignment of tcs is complete each of the TPs are selected in turn and the elements at one in each TP are replaced by the tc pattern, the elements at 0 in each TP are replaced by a eld of r zeros. This process is illustrated in Fig 5. The original input image is shown (t1), and 8 rotations of

that image around the central pixel are also shown. As an example, the image t1 is shown when the N point codes are replaced as just described.

Transformation codes;

Pattern number

Transformation code elements a b c d e f

t1 t2 t3 t4 t5 t6 t7 t8

1 1 0 0 1 0 1 0

0 0 1 0 0 1 0 0

0 0 0 1 0 0 1 0

0 0 0 1 1 0 0 1

0 1 1 0 0 0 0 0

1 0 0 0 0 1 0 1

Pattern t1 when coded as described in the text;

000000 100001 000000

100001 100001 000000

000000 000000 000000

Fig 5 A set of 8 rotated patterns, the set of transformation codes assigned to the patterns (r = 6, N = 2) and one encoded pattern (t1 using code 1).

Once all the patterns are coded in this way, they are all combined into one matrix by logically ORing all the patterns together, this is shown in Fig 6 for the patterns in Fig 5. As will be apparent this matrix contains p x r bits (p = number of elements in the original pattern, r = the number of bits in a code). This compares to F x p bits in the original set of rotated patterns

100111 101001 010101 -

110011 111111 101100 -

101110 110110 011101 -

Fig 6 A coded version of the patterns shown in Fig 5.

111 111 010 a

010 101 101 b

001 110 011 c

101 011 111 d

111 011 000 e

110 110 101 f

Fig 7 The set of coded patterns derived from the coded version in Fig 6.

(where F = the number of rotations). Thus while F > r, there will be a saving of space in representing the coded patterns. This also results in a reduced matching time when the ADAM stage is used, as it will be shown later. Before ADAM is used the coded pattern in Fig 6 must be separated up into r separate patterns (i.e. 6). How this is done is most easily understood if the coded matrix in Fig 6 is considered as 9 sets of r element numbers. To separate this up, the rst pattern is created from the most signicant bits of the 9 numbers (underlined in Fig 6), as shown in Fig 7a. The next pattern is created by selecting the next most signicant bit of each number, as shown in Fig 7b.

The result of this is 6 patterns each of p elements as shown in Fig 7. Now there are only 6 patterns to pass to ADAM, instead of 8 as shown in Fig 5. The next section shows how ADAM is used to recognise the pattern using the patterns in Fig 7. Teaching and Testing ADAM It was described above how ADAM could be presented an unknown pattern, producing at its output a pattern class response vector (PCR). To allow recognition of an unknown pattern ADAM needs to be rst trained on an example pattern under a particular pattern class. For this ADAM is set up as shown in Fig 3 and presented with an example of a pattern as shown in Fig 8.

Fig 8 ADAM taught on an example pattern, under a particular class. It is important to note that Fig 8 does not show the N tuple processing front end, which is removed for clarity and will be discussed later. Once the pattern has been taught under the selected class (100010 in the above example), testing on an unknown pattern can be performed. To achieve this the coded input pattern, like that described in Fig 7, is presented to ADAM. Each of the six patterns are presented, resulting in six PCR vectors after recall. Fig 9 shows the result of applying the patterns in Fig 7 to the ADAM memory taught as shown in Fig 8. Once the PCR has

Coded pattern number

PCR result

a b c d e f

3 2 2 1 2 3

2 0 1 2 2 1

0 0 0 0 0 0

2 0 1 2 2 1

3 2 2 1 2 3

0 0 0 0 0 0

Fig 9 Result PCRs when ADAM is tested on the patterns shown in Fig 8.

been recovered, the nal stage is to recover the transformation code and pattern codes from these results. This can be achieved using these PCR values because of the n point coding used. Fig 10 shows the recovery of both codes. It shows one way of performing the recovery operation; other more efcient methods are still under investigation. The method is performed in two stages. In stage 1 each separate PCR is N point thresholded as described above, with N=2 in this case. This results in a set of pattern codes (PCs). Next the condence of these codes is calculated, by performing the scaler product of the PC vector with the PCR vector (shown in Fig 9). The result of this is a set of condence values which are then themselves N point thresholded (at 2 in this case) to recover the transformation class of the best matching pattern. Once this is done the classes which relate to the places where the transformation code is set to one are recovered. This is shown in stage 2 in Fig 10. The retained pattern class codes should be the same and relate to the pattern that the class belongs to. In this example the results are correct. However, in such small examples this in not usually the case. Analysis in Austin and Austin illustrate how both ADAM and the transformed front end need to be large to produce a reliable solution. The nal section illustrates the neural implementation of the above and shows how an orthogonalising front end may be added to provide the same

Stage 1:

Pattern Class Code

Confidence

Recovered transformation code

100010 100010 100010 010100 110110 100010

6 4 4 4 * 6

1 0 0 0 0 1

* => impossible code of N > 2 recovered.

Stage 2:

Best Class Code

Transformation code

100010 000000 000000 000000 000000 100010

1 0 0 0 0 1

100010

= Class code

Fig 10 Applying the N point threshold process to the PCR results given in Fig 9. This results in the transformation code and the class code of the pattern originally taught.

effect on the input pattern as N tuple processing. Neural Implementation and Pre-processing The neural implementation of the method consists of setting out the processing stages as a number of interconnecting layers of neural units. The basic construction of ADAM was shown earlier. Fig 11 shows how this concept is extended to implement the ADAM memory, including the invariant

front end as a massively parallel processor. The implementation of these types of system in dedicated hardware has been considered previously and has been shown to be simple and possible with present IC technology.

Fig 11 Neural implementation of the three stages of processing. The implementation in neural systems is best viewed in three stages. Stage one consists of a transformation matrix which takes an input pattern and modies it by a set of transformations stored in the encoded form. This network outputs the coded pattern shown in Fig 6. The second stage is the ADAM memory extended to take the coded pattern and to output the raw pattern code and the raw transformation code. The nal stage performs the N point thresholding. This can be totally implemented in neural hardware using a modied winner- take-all network. However, it is a simple matter to perform this operation with conventional processors at speed. The production of the coded pattern by the rst stage of the network is achieved by presenting a pattern on the input to the network. Transformation codes stored at each intersection (binary weight) are passed to the output units. In the examples shown here the vertical wires perform a wired OR

operation (logical OR of weights x inputs). Setting up this network is at present performed manually. However, trainable mappings are possible which allow the network to learn arbitrary transforms. This will be discussed in a later report. The size of the rst stage network can be quite large with large input mappings. However, the use of a polar co-ordinate sampling window has allowed this to be reduced greatly for transformations involving rotation and scaling. The front end processor is described briey below. The second stage of the network shown in Fig 11 acts as the ADAM memory. The coded pattern appears on the vertical wires from the rst stage. As illustrated in the functional discussion this input is most easily seen as a number of separate input patterns (each made up of one line from each group of 3 vertical inputs, in this example r for the transformation code = 3). This correlation matrix functions in a similar way as the rst stage, except during recall where the horizontal wires and neurons act as summing units, adding the result of a vertical wire excitation with a link set to 1 or 0. The output of this stage is the set of pattern class responses (PCR) which contain the transformation and class codes. The nal stage performs the N point thresholding of the PCRs. N point thresholding is applied to the output of the transformation units, the result of which is then used to constrain the network to achieve recall of the pattern class. The exact details of a neuron implementation are not necessary, as it is more efciently performed with conventional sequential computations. The structure shown indicates the major functions that would be carried out by neural processors. The nal section briey describes the polar co-ordinate front end, implemented to reduce the size of the transformation sub- system network, and to provide a partially orthogonalised input image to the system. Front End Processor The N tuple front end processor that was shown in the description of the ADAM memory above, acts to expand the input image, essentially to allow linear separation of non-linearly separable input patterns. Making the pattern linearly separable greatly simplies the neural network needed to classify the pattern. The description of the transformational sub-system given above did not consider any type of input processing. However, some type of pre-processor (N tuple or otherwise) is essential for the system to work. In the present system a highly sophisticated input pre-processor is being

investigated. This is needed to full the need for orthogonalisation, and reduce processing overheads. The pre-processor is essentially a foveal based processing system, which incorporates a polar sampling scheme, with linearly increasing sample size from the origin of the sampling window. An example of the sampling structure is shown in Fig 12.

Fig 12 Polar sampling used in the input pre-processor Each circle in Fig 12 is the sampling area of a small grey scale N tuple recognition processor. The use of a trainable operator at this level of the system allows complete exibility in the selection of features to be passed on to the next stage of processing. At present it is trained as a simple edge operator, set up to recognise one of 16 orientations of edges. The result of this processing is passed to the transformational pre-processor. Example Currently the system above is implemented in C on a SUN 3/160 with associated frame store. Dedicated implementation of some of the systems functions is currently under investigation. As an example of the methods operation, Fig 13 shows a simple shape which is to be recognised after rotation. Fig 14 shows the results of various levels of processing resulting in a correct transformation class. The rst table illustrates the successfull recovery of the pattern rotation for the pattern tested at the given rotation. The average class condence is the average value of the transformation class elements that have been set to one after N point thresholding. The values of rho and theta are obtained by comparing the recovered transformation code with the ones used in teaching. An example output of the RCP is given, and the result after thresholding. References

Fig 13 Simple example shape to be recognsied after rotation rotation degrees 0 22.5 45 67.5 90 average class condence 98 59 65 59 46 recovered transformation theta rho 0 -14 -12 -10 -8 0 0 0 0 0

Table 1 Result of recognising the object in Fig 13 at various rotations.


1 26 2 0 1 0 6 1 5 11 18 8 12 4 9 4 4 7 6 9 0 0 3 13 0 0 2 20 2 8 14 1 3 4 2 25 4 11 26 11 11 60

12 8 0 8 0

27 24 12 4 5 8

25 58 6 11 9 4 0

11 1 0 2 1 0

13 0 2 4

12 35 3 15 12 8 4

12 7 0 3 0

23 9 23 7 6

17 21 0 9 11 9

16 25 0

17 21 0

10 0

RCP result after testing on the pattern in Fig 13 rotated by 22.5 degrees

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 1 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 1

Transformation code after thresholding the RCP at N = 2. Fig 14 Recognition of the simple shape in Fig 13.

You might also like