Seminar

on

Clockless Chips

Date: October 25, 2005.

Presented by: K. Subrahmanya Sreshti. (05IT6004).

queue up several instructions in a “pipeline” and analyze them to see if switching the order in which they are executed can produce the correct result. the designers of microprocessors have resorted to all sorts of tricks to make their products run faster. high power consumption. Modern chips. high electromagnetic noise etc. generates its own overhead. which is. The clock in a state-of-the-art microprocessor can consume up to 30 percent of the chip's computing capability. For these reasons the clock-less technology is considered as the technology. The faster the clock. After a point. cranking up the clock speed becomes an exercise in diminishing returns. which uses a technique known as asynchronous logic. Faced with diminishing returns.ABSTRACT Clock less approach. with that percentage increasing at an ever faster rate as clock speeds increase. the greater the overhead becomes. differs from conventional computer circuit design in that the switching on and off of digital circuits are controlled individually by specific pieces of data rather than by a tyrannical clock that forces all of the millions of the circuits on a chip to march in unison. Introduction Over the years. The clock. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip. chip designers are dusting down two technologies—called multi-threading and asynchronous logic—that were both invented 2 . for example. through the work it must do to coordinate millions of transistors on a chip. It overcomes all the disadvantages of a clocked circuit such as slow speed. however. only more quickly. going to drive majority of electronic chips in the coming years.

The role of the clock is to guarantee that the answer will be ready at a 3 . which can cause radio interference. At the time. Each tick must be long enough for signals to traverse even a chip’s longest wires in one cycle. the clock remains the key part of the action. before finally settling down into the correct answer. As chips get bigger. particularly as clocks get faster. the tasks performed on parts of a chip that are close together finish well before a cycle but can’t move on until the next tick. neither was competitive with conventional designs. from 4 to 12 to 8. Clocked chips also produce electromagnetic emissions at their clock frequency. you might see the value changing many times. it becomes more difficult for ticks to reach all elements. As chips get bigger and more complex. Another drawback with clocked designs is that they waste a lot of energy. As a microprocessor performs a given operation. largely because it is easier to design chips in which things happen only when the clock ticks. only after all signals have completed their journey is the correct value assured. encountering logic gates-until they finally deposit the results of the computation in a temporary memory bank called a register. but important uses have since emerged for each of them. while asynchronous logic is ideal for wireless devices and smart cards. intersecting again. electronic signals travel along microscopic strips of metal forking. faster and more complicated. Multi-threading can increase the performance of database. If you could slow down the chip and peek into the register as this calculation was being completed. say. Let's say you want to multiply 4 by 6.and web-servers. distributing the clock signal around the chip becomes harder. However. since even inactive parts of the chip have to respond to every clock tick. Problems with Synchronous Approach The synchronous approach predominated. In today's chips. That's because the signals transmitted to perform the operation travel along many different paths before arriving at the register.decades ago.

As long as they all arrive before the next tick. registers use energy to switch so that they are ready to receive new data whenever the clock ticks. referred to the problem in his keynote speech at the International Solid-State Circuits Conference last February. This frequency is measured in gigahertz or megahertz. Gelsinger was only half-joking when he said that if microprocessors continue to be run by ever-faster clocks. such as wire lengths. chief technology officer at Intel. Calculating performance is less defined with asynchronous designs. The chip’s clock is an oscillating crystal that vibrates at a regular frequency. In asynchronous designs. gates switch only when they have inputs. An advantage of synchronous chips is that the order in which signals arrive doesn’t matter. And it is easier to determine the maximum performance of a clocked system. calculating performance simply involves counting the number of clock cycles needed to complete an operation. then by 2005 a chip will run as hot as a nuclear reactor. in synchronous designs. which sends its signals out along all circuits and controls the registers. but the register waits until the next clock tick before capturing them. most of which ends up as heat. whether they have inputs to process or not. Patrick Gelsinger. when working on chips. With these systems. The chip is designed so that even the slowest path through the circuit-the path with the longest wires and the most gates-is guaranteed to reach the register within a single clock-tick. Designers thus don’t have to worry about related issues. depending on the voltage applied. and the order in which the processor performs the necessary tasks. In addition. the system can process them in the proper order. the data flow. Signals can arrive at different times. 4 . All the chip’s work is synchronized via the clock. The clocks themselves consume power and produce heat. The job of coordinating tens of millions of transistors at a billion ticks per second requires the consumption of a lot of energy.given time.

allow different bits of a chip to work at different speeds. or “clockless”. Rather than waiting for a clock tick. throwing out the clock is difficult to imagine. within every one-gigahertz microprocessor. this clock ticks one billion times a second. and all of the chip’s processing units co-ordinate their actions with these ticks to ensure that they remain in step. designs. For a 1GHz chip. Asynchronous. 5 . The clock establishes a timing constraint within which all chip elements must work. Asynchronous logic circuits (Stop the clocks) As its name suggests. it does away with the cardinal rule of chip design: that everything marches to the beat of an oscillating crystal “clock”. and constraints can make design easier by reducing the number of potential decisions. the fundamental way that chips have organized and executed their work. sending data to and from each other as and when appropriate.By throwing out the clock. For most chip designers. also called asynchronous or self-timed. in contrast. don’t use the oscillating crystal that serves as the regularly “ticking” clock that paces the work done by traditional synchronous processors. For instance. clockless-chip elements hand off the results of their work as soon as they are finished. Engineers are trained to design chips where their first consideration is getting work done before the next clock-tick comes around. Clockless processors. there lies an oscillating crystal ticking one billion times a second.

not whenever a clock ticks. and places it on the data bus. How clockless chips work There are no purely asynchronous chips yet. in which circuits operate only when they have work to do. The asynchronous processor places the location of the stored data it wants to read onto the address bus and issues a request for the information. each circuit uses power only when it performs work. unlike synchronous chips. local handshaking controls the passing of data between logic modules. The memory reads the address off the bus. According to Jorgenson. When the arrival rate exceeds the departure rate. Clockless elements use perfect clock gating.” The many handshakes themselves require more power than a clock’s operations. “Data arrives at any rate and leaves at any rate. Instead.Figure 1. Finally. 6 . However. the circuit stalls the input until the output catches up. the processor grabs the information from the data bus. clockless systems more than offset this because. finds the information. Instead of clock-based synchronization. today’s clockless processors are actually clocked processors with asynchronous elements. The memory then acknowledges that it has read the data.

asynchronous processors don’t use much power. clockless chips are particularly energy-efficient for running video. 7 . and other streaming applications — data-intensive programs that frequently cause synchronous processors to use considerable power. audio. data doesn’t all move at the same time. they are less likely to experience temperature-related problems and are more robust. the data moves on every clock edge. and robustness Because asynchronous chips have no clock and each circuit powers up only when used. According to Jorgenson. asynchronous processors use less energy than synchronous chips by providing only the voltage necessary for a particular operation. Clockless processors activate only the circuits needed to handle data. Power efficiency. thereby minimizing the strength and frequency of spikes and emitting less EMI. causing voltage spikes. Therefore. In clockless chips. Because they use handshaking. clockless chips give data time to arrive and stabilize before circuits pass it on. which spreads out current flow. Less EMI reduces both noise-related errors within circuits and interference with nearby devices. thus they leave unused circuits ready to respond quickly to other demands. responsiveness. This contributes to reliability because it avoids the rushed data handling that central clocks sometimes necessitate. During this inactive time. Asynchronous chips run cooler and have fewer and lower voltage spikes. who runs the Amulet project.Clockless advantages In synchronous designs. Streaming data applications have frequent periods of dead time — such as when there is no sound or when video frames change very little from their immediate predecessors — and little need for running error-correction logic. according to University of Manchester Professor Steve Furber.

the transistors on an asynchronous chip can swap information independently. rather than acting on all data throughout the process. as opposed to the typical approach of handing one line in each cycle. At both Intel and Sun. This gives a system time to handle and validate data before passing it along. Domino logic improves performance because a system can evaluate several lines of data at a time in one cycle. Also. This enables simpler. enabling a huge savings in battery-driven devices. thereby reducing errors. because asynchronous processors don’t need specially designed modules that all work at the same clock frequency. Advantages of the Clockless chips A clocked chip can run no faster than its most slothful piece of logic.Simple. Clockless chips draw power only when there is useful work to do.” said Jorgenson. “Registers communicate at their fastest common speed. they can use standard components. without needing to wait for everything else. However. faster design and assembly. an asynchronous-chip-based pager marketed by Philips 8 . Domino logic is also efficient because it acts only on data that has changed during processing. the recent use of both domino logic and the delay-insensitive mode in asynchronous processors has created a fast approach known as integrated pipelines mode. the blocks that it communicates with slow down. efficient design Logic modules could be developed without regard to compatibility with a central clock frequency. If one block is slow. it can run at the average speed of all components. the answer isn't guaranteed until every part completes its work. The result? Instead of the entire chip running at the speed of its slowest components. By contrast. which makes the design process easier. this approach has led to prototype chips that run two to three times faster than comparable products using conventional circuitry. The delay-insensitive mode allows an arbitrary time delay for logic blocks.

Another advantage of clockless chips is that they give off very low levels of electromagnetic noise. as typical hacking techniques involve listening to clock ticks. "The low-hanging fruit for clockless chips will be in communications devices. look now. Asynchronous is more like a milling crowd. The faster the clock. electronic funds exchange and personal identification. Asynchronous chips use 10 percent to 50 percent less energy than synchronous chips." says Fant. in which the clocks are constantly drawing power.Electronics. This allows details of the chip’s inner workings to be deduced." starting with cell phones Asynchronous logic would offer better security than conventional chips: "The clock is like a big signal that says." Analyzing the power consumption for each clock tick can crack the encryption on existing smart cards. Okay. They can perform encryption in a way that is harder to identify and to crack. There's no clear signal to watch. The combination of low noise and low power consumption makes asynchronous chips a natural choice for mobile devices. who is regarded as the guru of the field. for example. which use conventional clocked chips.which usually need low power sources . which will 9 . the more difficult it is to prevent a device from interfering with other devices. Improved encryption makes asynchronous circuits an obvious choice for smart cards— the chip-endowed plastic cards beginning to be used for such security-sensitive applications as storage of medical records.and the chips' quiet nature also makes them more secure. Ivan Sutherland of Sun Microsystems. Such an attack would be far more difficult on a smartcard based on asynchronous logic. dispensing with the clock all but eliminates this problem. Potential hackers don't know where to begin. runs almost twice as long as competitors' products. believes that such chips will have twice the power of conventional designs. "It's like looking for someone in a marching band. That makes them ideal for mobile communications applications .

but also to send "handshake" signals to indicate when work has been completed. Low noise and low electro-magnetic emission. 10 . Different styles: There are several styles of asynchronous design. One clockless approach. 2. Applications of Clockless Chips (more into technical details) 1. "Dual-rail" circuits use two wires giving the chip communications pathways. 3. and on the other wire a one. Sudden voltage changes on one of the wires represent a zero. 4. uses two wires for each bit. and a change in voltage on a 33rd wire indicates when the values on the other 32 wires are to be used. Replacing the conventional system of digital logic with what he calls "null convention logic." but also "no answer yet"—a convenient way for clockless chips to recognize when an operation has not yet been completed. Low and high voltages on 32 wires are used to represent 32 bits. Another approach is called “bundled data”. High performance. not only to send bits. But Dr Furber suggests that the most promising application for asynchronous chips may be in mobile wireless devices and smart cards. A good match with heterogeneous system timing. called “dual rail”. Low power dissipation.make them ideal for use in high-performance computers. Conventional chips represent the zeroes and ones of binary digits (“bits”) using low and high voltages on a particular wire." a scheme that identifies not only "yes" and "no.

However. The worst-case delay. Furthermore. Data-dependent delays The delay of the combinational logic circuit show in Figure-1 depends on the current state and the value of the primary inputs. the actual delay is always less (and sometimes much less) than the clock period.1. Thus. 11 . part of this advantage is canceled by the overhead required to detect the completion of a step. This leads. it may be difficult to translate local timing variability into a global system performance advantage. potentially. Asynchronous for High Performance In an asynchronous circuit the next computation step can start immediately after the previous step has completed: there is no need to wait for a transition of the clock signal. plus some margin for flip-flop delays and clock skew. is then a lower bound for the clock period of a synchronous circuit. to a fundamental performance advantage for asynchronous circuits. an advantage that increases with the variability in delays associated with these computation steps.

12 . which we consider next. The worst-case delay occurs when 1 is added to 2N . when adding 1 to 0. Assuming random inputs. for example.A simple example is an N-bit ripple-carry adder (Figure 2). the completion can be observed from the outputs of the adder. as. Dual-rail encoding of the carry signal has also been applied to a carry bypass adder.1. 0) (carry = true). 1) (carry = false) or to (1. That is. In an asynchronous circuit this variation in delays can be exploited by detecting the actual completion of the addition. 0) to (0. In the best case there is no carry ripple at all. the average length determines the average case delay of an asynchronous ripple-carry adder. cti) has made a monotonous transition from (0. When inputs and outputs are dual-rail encoded as well. but the clock period must be 6 times longer! On the other hand. the average length of the longest carry-propagation chain is bounded by log 2 N. For a 32-bit wide ripple-carry adder the average length is therefore 5. when each pair (cfi. Then the carry ripples from FA1 to FAN. the addition has completed when all internal carry-signals have been computed. Most practical solutions use dual-rail encoding of the carry signal (Figure 2(b)).

Moreover. 2. Asynchronous for Low Power Dissipating when and where active the classic example of a low-power asynchronous circuit is a frequency divider. Between the request and the next acknowledge phase the corresponding data wires must be kept stable. an asynchronous benefit of this kind must be balanced against a possible overhead in completion signaling and asynchronous control.Elastic pipelines In general it is not easy to translate a local asynchronous advantage in averagecase performance into a system-level performance advantage. The controller communicates exclusively with the controllers of the immediately preceding and succeeding stages by means of handshake signaling. Today's synchronous circuits are heavily pipelined and retimed. A D-flip-flop with its inverted output fed back to its input divides an incoming (clock) frequency by two (Figure 4(a)). 13 . A cascade of N such divide-by-two elements (Figure 4(b)) divide the incoming frequency by 2N. and controls the state of the data latches (transparent or opaque). Critical paths are nicely balanced and little room is left to obtain an asynchronous benefit.

a similar synchronous divider would dissipate in proportion to N. A second example is the infrared communications receiver IC designed at Hewlett-Packard/Stanford. all flipflops and all combinational circuits are active during each clock cycle. The receiver IC draws only leakage current while waiting for incoming data. The clock frequency is chosen that high to accommodate sequential algorithms that share resources over subsequent computation steps. However. A cascade of 15 such divide-by-two elements is used in watches to convert a 32 kHz crystal clock down to a 1 Hz clock. independent of N. sometimes by several orders of magnitude 2. Hence. In contrast. In such circuits. most modules operate well below the maximum frequency of operation. this fraction may be highly data dependent. slightly less than twice the power of its head element.The second element runs at only half the rate of the first one and hence dissipates only half the power. The single rail was clearly superior and consumed five times less power than the synchronous version. over a given period of time. the third one dissipates only a quarter. only a small fraction of registers change state during a clock cycle. and so on. For example. the entire asynchronous cascade consumes. Furthermore. which leads directly to prolonged battery life. but can start up as soon as a signal arrives so that it loses no data. One application for which asynchronous circuits can save power is Reed-Solomon error correctors operating at audio rates. 14 . One is vastly improved electrical efficiency. as demonstrated at Philips Research Laboratories. Also. in a digital filter where the clock rate equals the data rate. The potential of asynchronous for low power depends on the application. Then little or nothing can be gained by implementing the filter as an asynchronous circuit. Two different asynchronous realizations of this decoder (single-rail and dual-rail) are compared with a synchronous (product) version. fixed power dissipation is obtained. in many digital-signal processing functions the clock rate exceeds the data (signal) rate by a large factor. That is.

Asynchronous for Low Noise and Low Emission. The result is a factor five less power consumption. This advantage can be appreciated by analyzing the supply current of a clocked circuit in both the time and frequency domains. Viewed differently. as shown later in this issue. For example. the clock signal modulates the supply current as depicted schematically in Figure 5(a). A fourth application is a pager in which several power-hungry sub circuits were redesigned as asynchronous circuits. Due to the absence of a clock. Another example is that of a digital sub circuit that emits electromagnetic radiation at its clock frequency (and the higher harmonic frequencies).The filter bank for a digital hearing aid was the subject of another successful demonstration. 15 . asynchronous circuits may have better noise and EMC (Electro-Magnetic Compatibility) properties than synchronous circuits. Sub circuits of a system may interact in unintended and often subtle ways. 3. a digital sub circuit generates voltage noise on the power-supply lines or induces currents in the silicon substrate. This noise may affect the performance of an analog-to-digital converter connected so as to draw power from the same source or that is integrated on the same substrate. It gradually fades away and the circuit must become totally quiescent before the next productive clock edge. Circuit activity of a clocked circuit is usually maximal shortly after the productive clock edge. and a radio receiver sub-circuit that mistakes this radiation for a radio signal. They re-implemented an existing filter bank as a fully asynchronous circuit. Due to parasitic resistance and inductance in the on-chip and off-chip supply wiring this causes noise on the on-chip power and ground lines. this time by the Technical University of Denmark in cooperation with Oticon Inc.

By contrast. Asynchrony makes it easier to deal with interconnecting a variety of different clock frequencies. across-chip communication may no longer fit within a single clock period of a processor core. The introduction of additional interconnects layers and new materials (copper and low dielectric constant insulators) may slow down this trend somewhat.4. Heterogeneous system timing will offer considerable design challenge for systemlevel interconnect. Nevertheless. the delay of a piece of interconnect of fixed modest length increases. gate delays rapidly decrease with each technology generation. and clock skew. Their combined effect results in an increasingly heterogeneous organization of system-on-a-chip timing. soon leading to a dominance of interconnect delay over gate delay. new opportunities will arise for 16 . Hence. differences in clock phases and frequencies. FIFOs. Heterogeneous Timing There are two on-going trends that affect the timing of a system-on-a-chip: the relative increase of interconnects delays versus gate delays and the rapid growth of design reuse. new circuits and architectures are required to circumvent these parasitic limitations. For example. switch matrices. including buses. routers. without worrying about synchronization problems. and multi-port memories. According to Figure 7.

particularly with their memory and bus systems. Integrating clockless and clocked solutions In today’s clockless chips. There is also a shortage of asynchronous design expertise. there is a shortage of expertise. Also. 17 .asynchronous interconnect structures and protocols. Once asynchronous on-chip interconnect structures are accepted. whereas asynchronous components allow validation and arrival to occur at their own pace. asynchronous chips don’t complete instructions at times set by a clock. Clocked components require that data bits be valid and arrive by each clock tick. Lack of tools and expertise Because most chips use synchronous technology. Unlike synchronous processors. for clockless processors. but also colleges have fewer asynchronous design courses. This requires special circuits to align the asynchronous information with the synchronous system’s clock. mixed synchronous-asynchronous circuits hold promise. as well as coding and design tools. This variability can cause problems interfacing with synchronous systems. Not only is there little opportunity for developers to gain experience with clockless chips. Clockless challenges Asynchronous chips face a couple of important challenges. asynchronous and synchronous circuitry must interface. the threshold to introduce asynchronous clients to these interconnects is lowered as well.

References: 1) Scanning the Technology: Applications of Asynchronous Circuits – C. Mark B.cs.technologyreview. (Kees) van Berkel.computer.com/articles/01/10/tristram1001. and Steven M.org/comp/mags/dt/2003/06/d6005.columbia.ieee.cs.asp 6) http://www1. Josephs.edu/async/misc/technologyreview_oct_01_2001.pdf (October 2001) 3) http://csdl2. Nowick 2) http://ieeexplore.htm 18 . H.columbia.pdf 4) http://www1.html 5) http://www.edu/async/misc/economist/Economist_com.org/iel5/2/30617/01413111.

Sign up to vote on this title
UsefulNot useful