ASYNCHRONOUS EVENT REDIRECTING IN BIO-INSPIRED COMMUNICATION Ph. H¨ fliger a Institute of Informatics University of Oslo, Norway e-mail: hafliger@ifi.

uio.no
ABSTRACT The paper presents the FPGA implementation of a programmable asynchronous digital circuit (henceforth called AE-map) that remaps ‘address events’. Address event representation (AER) is an event driven communication protocol originally used in VLSI implementations of neural networks to transfer action potentials (neural voltage pulses) between neurons. More generally speaking it is suited to transmit a number of analog values that are coded in frequency of events over an asynchronous digital bus. The AE-map allows to redirect such events between an AE sender and an AE receiver, thereby for instance programming the connection scheme of a neural network. Earlier approaches for redirecting AEs have used digital synchronous devices such as DSPs or microcontrollers. The more simple and more dedicated asynchronous solution presented here is more energy efficient, does not impose a discretization on the time axis and achieves a much faster throughput. In the present implementation AEs (9 bit input, 7 bit output) can be processed at intervals of less than 84ns per output AE. 1. INTRODUCTION 1.1. Address Event Representation Address event representation (AER) is an event driven communication protocol that has originally been put forward within the field of ‘neuromorphic aVLSI’ [6, 5, 9]. Neuromorphic engineering tries to incorporate operatingprinciples of the nervous system into technical devices [8]. AER was first used to approach the massive connectivity of biological neural networks but in general it is suited to convey a large number of analog values (e.g. sensory data) through a low capacity channel (an asynchronous digital bus). It works as follows: AER is used to transmit ‘events’. Events are characterized by a location (address) and a time. For example, in a network of neurons the address identifies one particular neuron and the time would be the time at which the neuron fires an action potential (AP=nerve pulse). For the transmission of a number of analog values (e.g. pixels in a camera) one would code the intensity in the frequency of such events (rate coding). (This transformation of an intensity (e.g. a photodiode current) into a event rate can be achieved quite easily by placing a simple integrate-and-fire neuron circuit (6 transistors, 2 small capacitors) into the pixel.) An asynchronous digital bus is used for the actual transmission. The event’s location is encoded digitally as an ‘address’ which is placed on the bus at the time of the event. On the receiver end of the bus this address is again decoded into a receiving location. For neural networks that location would be a particular synapse (input site) of a particular neuron on that receiver chip and for rate coded analog values it could be some integrator that reconstructs the analog value (e.g. a pixel on a screen). Or these addresses can directly be used by a digital device without the effort of an AD conversion. This event driven strategy is more energy efficient than scanning (as for example in video connections), if the data is sparse, i.e. if only a few sender locations tend to be very active at a time. An example of such data would be the output of a silicon retina [6]. This is an ‘intelligent camera’ inspired by the biological retina (the photo sensitive tissue in the back of the eye). It performs some processing on an image already in the recording pixel. One variant of a silicon retina for example is only sensitive to changes. And since natural scenes tend to be rather static, fast changes happening only around edges of moving objects, a scanning strategy wastes a lot of energy on reading pixels where nothing is happening. In the worst case detection of changes might be delayed for the time it takes to scan through the whole image or even be missed, if they are synchronous with the frame rate. Whereas an AER strategy immediately reports on a change in a pixel. The drawback is a risk of over-running the bus and the need for collision handling. Other publications deal with these issues [5, 9, 1, 2, 4] (and the AER map implementation presented in this paper assumes that collisions are resolved, before AEs are placed on the bus). In general it can be said that for transmitting analog data there is a trade off in temporal resolution, intensity resolution, size of address space, and expected occupation of that address space. To come back to the example of the change sensitive retina, given the timing of the AEs not only the rate of change (rate coding) can be reconstructed but also the onset of the change is quite evident an undisturbed by a

1. and secondly in synchronous designs the next clock cycle starts after all local operations are completed. which provides fast 16-bit parallel handshake controlled commu- 50 ns / div 84 ns 84 ns Figure 4: A recording from the AER map depicting the minimal output interval in case of a ’one to one’ mapping. that it is locally not always easily possible to compute. if the AER map is to be used in a system that relies on temporal codes. ARCHITECTURE OF AN ASYNCHRONOUS ADDRESS EVENT MAP In a neural net structure normally one neuron is connected to many other neurons. The ‘ideally’ refers to the fact. implementing it asynchronously avoids introducing a discretization error in the time domain. 2. 50 ns / div 88 ns 72 ns 52 ns Figure 3: A recording from the FPGA by a logic analyzer that illustrates the minimal output interval and the latency of processed AEs. The order of those onsets in neighbouring pixels indicates a direction of motion and by measuring the intervals between the onsets (temporal coding) the speed of that motion becomes evident. Concerning energy efficiency. the slowest component limits the overall speed of a sequence of operations. In AER that means that the sender address has to be mapped to several receiver addresses.2. Communication with a Sun Ultra 5 workstation is achieved by a ‘PCI 16D’ card from EDT. or it can also be handled by a separate component. that is more dedicated to that particular task. which considerably sped up synchronous processors two decades or so ago. A synchronous programmable AE map based on a DSP has been presented in [3]. Pipelining. Such an AE map can be designed to be programmable such that arbitrary network structures (mappings) can be investigated. mapping addresses on a sender bus to addresses on a receiver bus. However in a pipelined operation. Asynchronous Devices In asynchronous designs as opposed to synchronous ones each component works at its own pace. The whole design is implemented on an ALTERA Flex FPGA (EPF10K20RC2083). It has to obtain that information from the component that provides the data. and others have used micro controllers (unpublished). This mapping could be hardwired on the sending and receiving IC-chip. . The size of the input and output address spaces were chosen to connect a particular retina chip to an array of artificial neurons. which consumes a considerable percentage of the total power in fast. in which case there is no real advantage gained by the second argument. ceiving sites (or vice versa). such that the address on the AE-bus would correspond to one sending and several re- consumed consume NOR2 INPUT OUTPUT NOR2 processed process AND2 NOR2 INPUT OUTPUT process NOR2 AND2 NOR2 NOR2 processed WIRE NOT OUTPUT NOT INPUT OUTPUT WIRE reqin ackout INPUT reqout ackin Figure 2: The ‘HS propagate’ circuit (used in figure 1) synchronizes a pipelining stage with its neighbours. asynchronous circuits have the advantage that they do not actively consume current when they are idle and that they do not need a clock. In sequential processes each component has to know when the data it is supposed to process is ready. Firstly the slowest component (that dictates the clockrate in the synchronous approach) might not always be part of every operation.frame rate (the first event in a burst of activity). whether a component has finished its operation. The asynchronous unclocked implementation of the AE bus avoids introducing a discretization error on this temporal code. In the following there will be a much simpler asynchronous device presented. Figure 1 shows the block diagram of the asynchronous implementation. Still an asynchronous design can get en edge on even optimally pipelined synchronous solutions for two reasons. whereas in asynchronous designs ideally the next step in a sequential operation starts immediately when the previous operation is completed. These arguments have convinced researchers to even start developing asynchronous micro-processors [7] and an increasing number of commercial asynchronous devices are nowadays available. highly integrated circuits. and so as a work around the unit can simply indicate that it is finished after a fixed delay. And as previously mentioned. is a natural result of this approach.

The FPGA is placed on a simple PCB board that contains additional bus drivers to connect the FPGA to the 16D bus.. Some additional blocks on the FPGA (not shown) make it possible to configure the AE map via the 16D bus.0] EQ0 cnt_en 9 A 8 D RAM1 Q we[1] we_glob WE D LATCH cnt_clk 12ns 4ns 12ns 16ns processed processed consumed consumed consumed req_in consume cnt_en cnt_clk consume ack_out req req HS_propagate ack ack req req HS_propagate ack ack 20ns processed process consume process process req req HS_propagate ack ack req_in ack_out EDOCED access_mode[1. i. RAM1 contains pointers to memory blocks in RAM3 to the right of the figure. AEs can be sent to the map by this 16D bus or through two other connectors on the PCB.0] eq0 we_glob Figure 1: The schematics of the AER map.9 5 D[8. AEs from the map are put out on a fourth connector.0] we[3.. (Note that the blocks for different incoming AEs can overlap. The ‘HS propagate’ circuit depicted in figure 2 synchronizes a pipelining stages with the previous and the next stage using a 4 phase handshake.) When the pointer and the block size are stable.0]’ distinguishes between an AE input (access mode=0) and write accesses to the three RAMs (access mode=1. The signal ’access mode[1. The first stage reads in the incoming AE that addresses the content in the two left hand RAMs (RAM1 and RAM2). The logic to the right of the second ’HS propagate’ block generates now the number of hand-shakes as determined by the block size. when an acknowledge or a request is withdrawn too soon. Additional circuitry on the FPGA (not shown) performs an asynchronous arbitration between those three sources of input.. When the processing is completed (‘processed’) an outgoing handshake is initiated at the completion of which ‘process’ is reset. then the AE map circuit puts out AEs in intervals between 52 ns and 84 ns (de- . which merely controls the memory access to RAM3 an handles the handshake with the external receiver of the outgoing AEs.e.0] A ALD Q eq0 1 BUS MUX 0 ADD Q we_glob 8 A 7 D RAM3 Q we[3] WE AE_out[6. The circuit of the AE map (figure 1) is subdivided into three pipelining stages (separated by dashed lines).0] we[2] D RAM2 Q WE D EN DOWN COUNTER A[8. The ’delay’ elements contain an appropriate number of RS-flipflops in series to achieve the indicated delays. the first ’HS propagate’ circuit issues a request to the next pipelining stage which latches the pointer and loads the block size into a counter. PERFORMANCE If the request and acknowledge signals of the output port are short circuited on the PCB. the ‘process’ signal is not active) an incoming request is acknowledged as soon as the incoming data is ‘consumed’ and the ‘process’ signal is set..e. Each ’HS propagate’ circuit is in control of one pipelining stage. The pointer to the block and the counter value are added and the resulting pointer into the memory block is handed to the last pipelining stage. RAM2 contains the sizes of these blocks.2 or 3). nication. This circuit has the important advantage that it does not hang even if the causality rules of a handshake are not followed by the providing and the receiving partner. Thereafter new incoming requests are accepted again. 3. This can be exploited to save memory as for instance two incoming AEs that are supposed to produce the same outputs can simply point to the same block.. The ’HS propagate’ circuits (described in figure 2) and the surrounding logic on the bottom of the figure control the timing and the sequence of events in the asynchronous computation. If the circuit is not already busy (i. These blocks in RAM3 contain all the outgoing AEs to which the incoming AEs are to be mapped..

S. In Adv. Press. Mead. 7A) (bus AEOUT all) and a 5 to (5. Nystr¨ m. 1992. Since the architecture is asynchronous. A communication architecture tailored for analog VLSI artificial neural networks: intrinsic performance and limitations. Cummings. R. o P. A. PhD thesis. pages 157–178. R. 4) is processed. although the DSP solution offers a bigger address space and the authors hope to be able to optimize their software further to achieve a shorter transmission interval of the order of 1 µs (private communication). [9] A. Switzerland. 3.pendent on the nature of the mapping. Manohar. Penez. It also illustrates the latency of processed AEs. 1994. When testing the implementation presented here on an FPGA it can process address events in less than 84 ns per output event. of Tech. Martin. 2. And especially while there are no AEs to process will the asynchronous implementation fare better. The circuit is programmed to map an incoming 4 (on bus P16OUT all) to (7F. An incoming AE sequence of (5. The latency from off-board (not shown) was 156 ns. 1. Gillespie. [3] S. M. Inst. 1990.uio.ifi. Douglas. see figures 3 and 4) if the AER map is overrun. Soc. Cal. A. Lines. In order to overrun the map. In Pulsed Neural Networks. September 1997. Boahen. The MIT Press. and its current consumption is minimal when no events are processed. J.. Mahowald. J. A. since there was no separate power line going to it and most of the power of the board goes into the bus drivers. on Neural Networks. [6] M. . [4] P. 1996. A aVLSI communication architecture for stochastically pulseencoded analog signals. Mortara and E. (17). A spike based learning rule and its ima plementation in analog hardware. 7B. 5. M. The design of an asynchronous MIPS R3000 microprocessor. Mahowald. Res. R. Neuromorphic electronic systems. One of the input connectors that did not go through bus drivers was used. and T. Wawrzynek. Lande. Høvin. This minimal interval for transmission from the 16D bus to the AER map was given by the delay on the bus. California. 1999. Silicon auditory processors as computer peripherals. 78:1629–1636. and in the arbitration circuits on the FPGA (not shown) that allow three different sources of input. [2] K. [5] J. in VLSI. It is an asynchronous design that is simpler. The delays caused by the signals going to and coming from off-chip plus the additional circuitry that arbitrates between three possible sources of input use up slightly more time than the map uses to process two subsequent inputs. ETH Z¨ rich. [7] A. 84 ns is about two orders of magnitude faster than the published 10 µs from a DSP based solution [3]. Deiss. 7C. IEEE. when it was programmed to map one incoming AE to only one outgoing AE (figure 4) a faster sender was ‘simulated’ by inverting the outgoing request and feeding it back as acknowledge. U. in the drivers that connected the bus to the PCB. Figure 3 shows a recording by a logic analyzer of such a scenario. A throughput-on-demand address-event transmitter for neuromorphic chips. Therefore the output interval is increased to 84 ns in this scenario. ISCAS. and M. IEEE Comp. The asynchronous implementation saves the power that would go into driving the clock in a synchronous design. [8] C. Abusland. Proc. 5:459–466. T. on Neural Networks. A synchronous solution with the same output rate would need to be clocked with at least 12 MHz (84 ns cycles) and would always consume the current that is necessary to drive that clock line. 1999. Unfortunately the energy consumption of the FPGA could not be measured directly on the PCB. For the recording in figure 4 the map was programmed to put out an F for an incoming 4. 7E. Vittoz. Adv. http://www. in VLSI. The latency of 88 ns of the AER map is measured between the onset of the incoming request signal (MAPREQ) and the onset of the first outgoing request (REQOUT). IEEE Trans. since in our setup we could only provide changing input AEs with a minimal interval of 300 ns. 2000. Res. M. The input address was hold constant. 4(3):523–528. IEEE Trans. In order to overrun the map with varying input from the 16D bus (figure 3) it had to be programmed to map every incoming AE to at least 6 outgoing AEs. M. no discretization is imposed on the time and therefore discretization errors in continuous time computations on address events are avoided. and A. Southworth. VLSI analogs of neuronal visual processing: A synthesis of form and function. 7D. and D. 0). The interval between subsequent outputs caused by two different inputs is 72 ns and the interval between two subsequent outputs caused by the same input is 52 ns. J. PhD thesis. A pulse-coded communications infrastructure for neuromorphic systems. faster and cheaper as compared to systems based on DSPs or micro controllers. 4. Silviotti. 1993. R. Pasadena. III:401–404.no/ u ˜hafliger. The signals MAPREQ and MAPACK are measured directly at input to the AER map (nodes ’reqin’ and ’ackout’ in figure 1). Lazzaro. REFERENCES [1] A. 4. Kwan Lee. pages 72–86. In any case we maintain the claim that the asynchronous solution fares better than a comparable synchronous implementation on the same FPGA. Whatley. H¨ fliger. CONCLUSION A simple and dedicated architecture and its implementation on a FPGA is presented that performs address event mapping.

Sign up to vote on this title
UsefulNot useful