ASYNCHRONOUS EVENT REDIRECTING IN BIO-INSPIRED COMMUNICATION Ph. H¨ fliger a Institute of Informatics University of Oslo, Norway e-mail: hafliger@ifi.

uio.no
ABSTRACT The paper presents the FPGA implementation of a programmable asynchronous digital circuit (henceforth called AE-map) that remaps ‘address events’. Address event representation (AER) is an event driven communication protocol originally used in VLSI implementations of neural networks to transfer action potentials (neural voltage pulses) between neurons. More generally speaking it is suited to transmit a number of analog values that are coded in frequency of events over an asynchronous digital bus. The AE-map allows to redirect such events between an AE sender and an AE receiver, thereby for instance programming the connection scheme of a neural network. Earlier approaches for redirecting AEs have used digital synchronous devices such as DSPs or microcontrollers. The more simple and more dedicated asynchronous solution presented here is more energy efficient, does not impose a discretization on the time axis and achieves a much faster throughput. In the present implementation AEs (9 bit input, 7 bit output) can be processed at intervals of less than 84ns per output AE. 1. INTRODUCTION 1.1. Address Event Representation Address event representation (AER) is an event driven communication protocol that has originally been put forward within the field of ‘neuromorphic aVLSI’ [6, 5, 9]. Neuromorphic engineering tries to incorporate operatingprinciples of the nervous system into technical devices [8]. AER was first used to approach the massive connectivity of biological neural networks but in general it is suited to convey a large number of analog values (e.g. sensory data) through a low capacity channel (an asynchronous digital bus). It works as follows: AER is used to transmit ‘events’. Events are characterized by a location (address) and a time. For example, in a network of neurons the address identifies one particular neuron and the time would be the time at which the neuron fires an action potential (AP=nerve pulse). For the transmission of a number of analog values (e.g. pixels in a camera) one would code the intensity in the frequency of such events (rate coding). (This transformation of an intensity (e.g. a photodiode current) into a event rate can be achieved quite easily by placing a simple integrate-and-fire neuron circuit (6 transistors, 2 small capacitors) into the pixel.) An asynchronous digital bus is used for the actual transmission. The event’s location is encoded digitally as an ‘address’ which is placed on the bus at the time of the event. On the receiver end of the bus this address is again decoded into a receiving location. For neural networks that location would be a particular synapse (input site) of a particular neuron on that receiver chip and for rate coded analog values it could be some integrator that reconstructs the analog value (e.g. a pixel on a screen). Or these addresses can directly be used by a digital device without the effort of an AD conversion. This event driven strategy is more energy efficient than scanning (as for example in video connections), if the data is sparse, i.e. if only a few sender locations tend to be very active at a time. An example of such data would be the output of a silicon retina [6]. This is an ‘intelligent camera’ inspired by the biological retina (the photo sensitive tissue in the back of the eye). It performs some processing on an image already in the recording pixel. One variant of a silicon retina for example is only sensitive to changes. And since natural scenes tend to be rather static, fast changes happening only around edges of moving objects, a scanning strategy wastes a lot of energy on reading pixels where nothing is happening. In the worst case detection of changes might be delayed for the time it takes to scan through the whole image or even be missed, if they are synchronous with the frame rate. Whereas an AER strategy immediately reports on a change in a pixel. The drawback is a risk of over-running the bus and the need for collision handling. Other publications deal with these issues [5, 9, 1, 2, 4] (and the AER map implementation presented in this paper assumes that collisions are resolved, before AEs are placed on the bus). In general it can be said that for transmitting analog data there is a trade off in temporal resolution, intensity resolution, size of address space, and expected occupation of that address space. To come back to the example of the change sensitive retina, given the timing of the AEs not only the rate of change (rate coding) can be reconstructed but also the onset of the change is quite evident an undisturbed by a

and secondly in synchronous designs the next clock cycle starts after all local operations are completed. which considerably sped up synchronous processors two decades or so ago. The order of those onsets in neighbouring pixels indicates a direction of motion and by measuring the intervals between the onsets (temporal coding) the speed of that motion becomes evident. Still an asynchronous design can get en edge on even optimally pipelined synchronous solutions for two reasons. that is more dedicated to that particular task. such that the address on the AE-bus would correspond to one sending and several re- consumed consume NOR2 INPUT OUTPUT NOR2 processed process AND2 NOR2 INPUT OUTPUT process NOR2 AND2 NOR2 NOR2 processed WIRE NOT OUTPUT NOT INPUT OUTPUT WIRE reqin ackout INPUT reqout ackin Figure 2: The ‘HS propagate’ circuit (used in figure 1) synchronizes a pipelining stage with its neighbours. Asynchronous Devices In asynchronous designs as opposed to synchronous ones each component works at its own pace. Pipelining. Firstly the slowest component (that dictates the clockrate in the synchronous approach) might not always be part of every operation. which consumes a considerable percentage of the total power in fast.frame rate (the first event in a burst of activity). This mapping could be hardwired on the sending and receiving IC-chip. . asynchronous circuits have the advantage that they do not actively consume current when they are idle and that they do not need a clock. highly integrated circuits. if the AER map is to be used in a system that relies on temporal codes. And as previously mentioned. A synchronous programmable AE map based on a DSP has been presented in [3]. that it is locally not always easily possible to compute. 1. ceiving sites (or vice versa). 50 ns / div 88 ns 72 ns 52 ns Figure 3: A recording from the FPGA by a logic analyzer that illustrates the minimal output interval and the latency of processed AEs. Figure 1 shows the block diagram of the asynchronous implementation. and others have used micro controllers (unpublished). 2. However in a pipelined operation. The ‘ideally’ refers to the fact. the slowest component limits the overall speed of a sequence of operations. in which case there is no real advantage gained by the second argument. is a natural result of this approach. The size of the input and output address spaces were chosen to connect a particular retina chip to an array of artificial neurons. Such an AE map can be designed to be programmable such that arbitrary network structures (mappings) can be investigated. Concerning energy efficiency. The asynchronous unclocked implementation of the AE bus avoids introducing a discretization error on this temporal code. whereas in asynchronous designs ideally the next step in a sequential operation starts immediately when the previous operation is completed. mapping addresses on a sender bus to addresses on a receiver bus. Communication with a Sun Ultra 5 workstation is achieved by a ‘PCI 16D’ card from EDT. or it can also be handled by a separate component. These arguments have convinced researchers to even start developing asynchronous micro-processors [7] and an increasing number of commercial asynchronous devices are nowadays available.2. In sequential processes each component has to know when the data it is supposed to process is ready. and so as a work around the unit can simply indicate that it is finished after a fixed delay. ARCHITECTURE OF AN ASYNCHRONOUS ADDRESS EVENT MAP In a neural net structure normally one neuron is connected to many other neurons. In AER that means that the sender address has to be mapped to several receiver addresses. In the following there will be a much simpler asynchronous device presented. whether a component has finished its operation. implementing it asynchronously avoids introducing a discretization error in the time domain. It has to obtain that information from the component that provides the data. The whole design is implemented on an ALTERA Flex FPGA (EPF10K20RC2083). which provides fast 16-bit parallel handshake controlled commu- 50 ns / div 84 ns 84 ns Figure 4: A recording from the AER map depicting the minimal output interval in case of a ’one to one’ mapping.

e. This can be exploited to save memory as for instance two incoming AEs that are supposed to produce the same outputs can simply point to the same block.0] we[3.. RAM1 contains pointers to memory blocks in RAM3 to the right of the figure. the first ’HS propagate’ circuit issues a request to the next pipelining stage which latches the pointer and loads the block size into a counter.) When the pointer and the block size are stable. when an acknowledge or a request is withdrawn too soon. The ‘HS propagate’ circuit depicted in figure 2 synchronizes a pipelining stages with the previous and the next stage using a 4 phase handshake.. When the processing is completed (‘processed’) an outgoing handshake is initiated at the completion of which ‘process’ is reset. AEs can be sent to the map by this 16D bus or through two other connectors on the PCB.. Some additional blocks on the FPGA (not shown) make it possible to configure the AE map via the 16D bus.0] A ALD Q eq0 1 BUS MUX 0 ADD Q we_glob 8 A 7 D RAM3 Q we[3] WE AE_out[6. This circuit has the important advantage that it does not hang even if the causality rules of a handshake are not followed by the providing and the receiving partner.e.9 5 D[8. which merely controls the memory access to RAM3 an handles the handshake with the external receiver of the outgoing AEs.. Thereafter new incoming requests are accepted again.0]’ distinguishes between an AE input (access mode=0) and write accesses to the three RAMs (access mode=1. The logic to the right of the second ’HS propagate’ block generates now the number of hand-shakes as determined by the block size.. AEs from the map are put out on a fourth connector.0] EQ0 cnt_en 9 A 8 D RAM1 Q we[1] we_glob WE D LATCH cnt_clk 12ns 4ns 12ns 16ns processed processed consumed consumed consumed req_in consume cnt_en cnt_clk consume ack_out req req HS_propagate ack ack req req HS_propagate ack ack 20ns processed process consume process process req req HS_propagate ack ack req_in ack_out EDOCED access_mode[1. i. RAM2 contains the sizes of these blocks. The ’delay’ elements contain an appropriate number of RS-flipflops in series to achieve the indicated delays. nication. If the circuit is not already busy (i. The circuit of the AE map (figure 1) is subdivided into three pipelining stages (separated by dashed lines). The ’HS propagate’ circuits (described in figure 2) and the surrounding logic on the bottom of the figure control the timing and the sequence of events in the asynchronous computation. The FPGA is placed on a simple PCB board that contains additional bus drivers to connect the FPGA to the 16D bus.0] eq0 we_glob Figure 1: The schematics of the AER map.2 or 3). (Note that the blocks for different incoming AEs can overlap. Each ’HS propagate’ circuit is in control of one pipelining stage.0] we[2] D RAM2 Q WE D EN DOWN COUNTER A[8. Additional circuitry on the FPGA (not shown) performs an asynchronous arbitration between those three sources of input. The pointer to the block and the counter value are added and the resulting pointer into the memory block is handed to the last pipelining stage. then the AE map circuit puts out AEs in intervals between 52 ns and 84 ns (de- . The signal ’access mode[1. 3. These blocks in RAM3 contain all the outgoing AEs to which the incoming AEs are to be mapped. PERFORMANCE If the request and acknowledge signals of the output port are short circuited on the PCB.. The first stage reads in the incoming AE that addresses the content in the two left hand RAMs (RAM1 and RAM2). the ‘process’ signal is not active) an incoming request is acknowledged as soon as the incoming data is ‘consumed’ and the ‘process’ signal is set.

http://www. M. A spike based learning rule and its ima plementation in analog hardware. The latency from off-board (not shown) was 156 ns. Martin. Whatley. Proc. PhD thesis. Southworth. Mahowald. 1993. 1992. When testing the implementation presented here on an FPGA it can process address events in less than 84 ns per output event. In Pulsed Neural Networks. Vittoz. Cummings. Figure 3 shows a recording by a logic analyzer of such a scenario. ISCAS.no/ u ˜hafliger. California. Mead. Therefore the output interval is increased to 84 ns in this scenario. Inst. R. and D. IEEE Comp. The design of an asynchronous MIPS R3000 microprocessor. 1999. Wawrzynek. Manohar. One of the input connectors that did not go through bus drivers was used. The latency of 88 ns of the AER map is measured between the onset of the incoming request signal (MAPREQ) and the onset of the first outgoing request (REQOUT). o P. when it was programmed to map one incoming AE to only one outgoing AE (figure 4) a faster sender was ‘simulated’ by inverting the outgoing request and feeding it back as acknowledge. pages 72–86. 1. 2. R. faster and cheaper as compared to systems based on DSPs or micro controllers. PhD thesis. A. CONCLUSION A simple and dedicated architecture and its implementation on a FPGA is presented that performs address event mapping. [2] K. 7C. . M. [6] M. 7D. Since the architecture is asynchronous. 7E. Nystr¨ m. [8] C. since there was no separate power line going to it and most of the power of the board goes into the bus drivers. Switzerland. although the DSP solution offers a bigger address space and the authors hope to be able to optimize their software further to achieve a shorter transmission interval of the order of 1 µs (private communication). IEEE. 1994.ifi. Res. A. The input address was hold constant. U. This minimal interval for transmission from the 16D bus to the AER map was given by the delay on the bus. VLSI analogs of neuronal visual processing: A synthesis of form and function. 4(3):523–528. of Tech. 84 ns is about two orders of magnitude faster than the published 10 µs from a DSP based solution [3]. A pulse-coded communications infrastructure for neuromorphic systems. Douglas. see figures 3 and 4) if the AER map is overrun. [3] S. since in our setup we could only provide changing input AEs with a minimal interval of 300 ns. Høvin. The delays caused by the signals going to and coming from off-chip plus the additional circuitry that arbitrates between three possible sources of input use up slightly more time than the map uses to process two subsequent inputs. Penez. 78:1629–1636. Deiss. [7] A. J. The asynchronous implementation saves the power that would go into driving the clock in a synchronous design. Kwan Lee. [5] J. and in the arbitration circuits on the FPGA (not shown) that allow three different sources of input. S. A. H¨ fliger. An incoming AE sequence of (5. 2000. J. A communication architecture tailored for analog VLSI artificial neural networks: intrinsic performance and limitations. Pasadena. Mortara and E. 7B. Lines. It is an asynchronous design that is simpler. It also illustrates the latency of processed AEs. IEEE Trans. Neuromorphic electronic systems. T. Silviotti.. 1990. and T. 4. Gillespie. Cal. The MIT Press. A aVLSI communication architecture for stochastically pulseencoded analog signals. Lazzaro.uio. pages 157–178. Unfortunately the energy consumption of the FPGA could not be measured directly on the PCB. on Neural Networks. and its current consumption is minimal when no events are processed. [9] A. [4] P. 3. Mahowald. Silicon auditory processors as computer peripherals. and M. 4. M. Adv. Soc. In order to overrun the map with varying input from the 16D bus (figure 3) it had to be programmed to map every incoming AE to at least 6 outgoing AEs. Abusland. 4) is processed. no discretization is imposed on the time and therefore discretization errors in continuous time computations on address events are avoided. And especially while there are no AEs to process will the asynchronous implementation fare better. in the drivers that connected the bus to the PCB. Res. R. (17). The signals MAPREQ and MAPACK are measured directly at input to the AER map (nodes ’reqin’ and ’ackout’ in figure 1). REFERENCES [1] A. A synchronous solution with the same output rate would need to be clocked with at least 12 MHz (84 ns cycles) and would always consume the current that is necessary to drive that clock line. In any case we maintain the claim that the asynchronous solution fares better than a comparable synchronous implementation on the same FPGA. A throughput-on-demand address-event transmitter for neuromorphic chips. Boahen. Press. 0). on Neural Networks. 1996. The circuit is programmed to map an incoming 4 (on bus P16OUT all) to (7F. For the recording in figure 4 the map was programmed to put out an F for an incoming 4. Lande. in VLSI. In Adv. 5:459–466. J. in VLSI. and A. 7A) (bus AEOUT all) and a 5 to (5. 1999. In order to overrun the map. M. III:401–404. The interval between subsequent outputs caused by two different inputs is 72 ns and the interval between two subsequent outputs caused by the same input is 52 ns.pendent on the nature of the mapping. ETH Z¨ rich. R. IEEE Trans. 5. September 1997.