Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Arvind Hoe 0904 i Eee

Arvind Hoe 0904 i Eee

Ratings: (0)|Views: 4|Likes:
Published by jalatf

More info:

Published by: jalatf on Jul 04, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Operation-Centric HardwareDescription and Synthesis
James C. Hoe
 , Member, IEEE,
and Arvind
 , Fellow, IEEE 
hardware abstraction is usefulfor describing systems whose behavior exhibits a high degreeof concurrency. In the operation-centric style, the behavior of a system is described as a collection of operations on a set of state elements. Each operation is specified as a predicate and aset of simultaneous state-element updates, which may only takeeffect in case the predicate is true on the current state values.The effect of an operation’s state updates is atomic, that is, thelegal behaviors of the system constitute some sequential inter-leaving of the operations. This atomic and sequential executionsemantics permits each operation to be formulated as if the restof the system were frozen and thus simplifies the description of concurrent systems. This paper presents an approach to syn-thesize an efficient synchronous digital implementation from anoperation-centric hardware-design description. The resultingimplementation carries out multiple operations per clock cycleand yet maintains the semantics that is consistent with the atomicand sequential execution of operations. The paper defines, andthen gives algorithms to identify,
sequentially composable
operations that can be performed in the same clockcycle. The paper further gives an algorithm to generate the hard-wired arbitration logic to coordinate the concurrent execution of conflict-free and sequentially composable operations. Lastly, thepaper evaluates synthesis results based on the TRAC compiler forthe TRSPEC operation-centric hardware-description language.The results from a pipelined processor example show that anoperation-centric framework offers a significant reduction indesign time, while achieving comparable implementation qualityas traditional register-transfer-level design flows.
 Index Terms
Conflict-free, high-level synthesis, operation-cen-tric, sequentially composable, term-rewriting systems (TRS).
I. I
OST hardware-description frameworks, whetherschematic or textual, make use of concurrent state ma-chines (or processes) as their underlying computation model.For example, in a register-transfer-level (RTL) language,the design entities are the individual registers (or other stateprimitives) and their corresponding next-state equations. Werefer to this style of descriptions as
because thedescription of behavior is organized by the individual state
Manuscript received March 11, 2003; revised December 19, 2003. This work was supported in part by the Defense Advanced Research Projects Agency, De-partment of Defense, under Ft. Huachuca Contract DABT63-95-C-0150 and inpart by the Intel Corporation. J. C. Hoe was supported in part by an Intel Foun-dation Graduate Fellowship during this research. This paper was recommendedby Associate Editor R. Gupta.J. C. Hoe is with the Department of Electrical and Computer Engi-neering, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: jhoe@ece.cmu.edu).Arvind is with Computer Science and Artificial Intelligence Laboratory,Massachusetts Institute of Technology, Cambridge, MA 02139-4307 USA(e-mail: arvind@csail.mit.edu).Digital Object Identifier 10.1109/TCAD.2004.833614
element’s next-state logic. Alternatively, the behavior of ahardware system can be described as a collection of opera-tions, where each operation can atomically update all or someof the state elements. In this paper, we refer to this style of descriptions as
because the description of behavior is organized into individual operations. Our notionof operation-centric models for hardware description is in-spired by the term rewriting systems (TRS) formalism [1].The operation-centric hardware model of computation is alsoanalogous to Dijkstra’s guarded commands [7]. Similar modelsof computation can be found in some parallel programminglanguages (e.g., UNITY [5]), hardware-description languagesfor synchronous and asynchronous design synthesis (e.g.,synchronized transitions (ST) [13], [16], [17]), and languages for hardware-design verification (e.g., st2fl [12], SINC [14]). In Section VII, we compare our description and synthesisapproach to synchronized transitions.The operation-centric model of computation consists of a setof state variables and a set of state transition rules. Each atomictransition rule represents one operation. An execution is inter-preted as a sequence of atomic applications of the state transi-tion rules. A rule can be optionally guarded by a predicate con-dition such that a rule is applicable only if the system’s statesatisfies the predicate condition. If several rules are enabled bythe same state, any one of the enabled rules can be
selected to update the state in one step, and after-wards, a new step begins on the updated new state. With pred-icated/guarded transition rules, an execution is interpreted as asequenceofatomicruleapplicationssuchthateachruleapplica-tion produces a state that satisfies the predicate condition of thenext rule to be applied. The atomicity of operation-centric statetransition rules simplifies the task of hardware description bypermitting the designer to formulate each rule/operation underthe assumption that the system is not simultaneously affectedby other potentially conflicting operations—the designer doesnot have to worry about race conditions between different op-erations. This permits an apparently sequential description of potentially concurrent hardware behaviors.It is important to note that the simplifying atomic and se-quential semantics of operations does not prevent a legal im-plementation from executing several operations concurrently.Implicit parallelism between operations can be exploited by anoptimizing compiler that produces a synchronous implemen-tation that executes several operations concurrently in a clock cycle. However, the resulting concurrent implementation mustnot introduce new behaviors, that is, behaviors which are notproducible under the atomic and sequential execution seman-tics. This paper presents an approach to synthesize such a syn-
0278-0070/04$20.00 © 2004 IEEE
chronous hardware implementation from an operation-centricdescription. The paper provides algorithms for
a hardwired arbitration logic to coordinate the concurrentexecution in an implementation.This paper is organized as follows. Section II
rst presentsthe abstract transition systems (ATS), an simple operation-cen-tric hardware formalism used to develop and explain the syn-thesis procedures. Section III next describes the basic synthesisprocedure to create a reference implementation that is function-ally correct but inef 
cient. The inef 
ciencies of the referenceimplementation are addressed in Sections IV and V using twooptimizations based on the concurrent execution of 
sequentially composable
operations, respectively. The re-sulting optimized implementation incorporates a hardwired ar-bitration logic that enables the concurrent execution of con-
ict-free and sequentially composable operations. Section VIpresents a comparison of designs synthesized from an opera-tion-centric description versus an state-centric RTL description.Section VII discusses related work, and Section VIII concludeswith a summary.II. O
We have developed an source-level language called TRSPECto support operation-centric hardware design speci
cation[9]. TRSPEC
s syntax and semantics are adaptations of awell-known formalism, TRS [1]. To avoid the complicationsof a source language, instead, we present a low-level opera-tion-centric hardware formalism called ATS. ATS is developedas the intermediate representation in our TRSPEC compiler.In this paper, we use ATS to explain how to synthesize anef 
cient implementation from an operation-centric hardwarespeci
 A. ATS Overview
ThestructureofATSissummarizedinFig.1.Atthetoplevel,an ATS is de
ned as a triple . is a list of explicitlydeclared state elements, including registers , arrays and
rst-out (FIFO) queues . is a list of initial valuesfor the elements in . is a set of operation-centric transi-tions,whereeachtransitionisapair .Inatransition, isaboolean predicate expression, and is a list of simultaneous ac-tions,exactlyoneactionforeachstateelementin .(Anaccept-able action for all state-element types is the null action
)In an ATS, if of is enabled in a state [abbreviated as, or simply ], then the simultaneous ac-tions prescribed by are applied atomically to update to. We use the notation to mean the resulting state . Inother words, is the functional equivalent of .
 B. ATS State Elements and Actions
In this paper, we concentrate on the synthesis of ATS withthree types of state elements: , , and . The three types ostate elements are depicted in Fig. 2 with signals of their sup-ported interfaces. Below, we describe the three types of stateelements and their usage.
Fig. 1. ATS summary.Fig. 2. Synchronous state primitives.
A register can store an integer value up to a speci
ed max-imum word size. The value stored in a register can be refer-enced using the side-effect-free query and set to usingthe action (written as and , respectively).An entry of an array can be referenced using the side-effect-freequeryandsetto usingthe action.Forbrevity, we abbreviate as in our examples.The oldest value in a FIFO can be referenced using the side-ef-fect-free query, and can be removed by the action.A new value can be added to a FIFO using the action.The actionisacompoundactionthatconceptually
dequeues the oldest entry before enqueuing a new entry. In ad-dition, the contents of a FIFO can be cleared using theaction. The status of a FIFO can be queried using the side-ef-fect-free and queries. Conceptually, anATS FIFO has a bounded but unspeci
ed size. The exact sizeis only set at implementation time, and hence a speci
cationmust be valid for all possible sizes. Applying an to a fullFIFO or applying to an empty FIFO is illegal. (Applyingaction to a full FIFO is legal however.) Any transi-tion that queries or acts on a FIFO must include the appropriateor queries in its predicate condition.
Fig. 3. Simple two-stage pipelined processor example with
Besides registers, arrays, and FIFOs, the complete ATS in-cludesregister-likestateelementsforinputandoutput.Aninputstate element is like a register but without the action. Aquery onan input element returns the value presentedon acorresponding external input port. An output state elementsupports both and , and its content is visible to theoutside of the ATS on a corresponding output port. For brevity,we omitted the discussion of and in this paper; for the mostpart, and are treated as during synthesis.
C. Pipelined Processor Example
In this example, we describe a two-stage pipelined processor,where a pipeline buffer is inserted between the fetch and exe-cute stages. In an ATS, we use a FIFO, instead of a register, asthe pipeline buffer. The buffering property of a FIFO providesthe isolationneeded topermit theoperations inthetwo stages tobe described independently. Although the resulting descriptionre
ects an elastic pipeline, our synthesis can infer a legal imple-mentation that operates synchronously and has stages separatedby registers (explained further in Section VI-A).The key state elements and datapath of the two-stagepipelined processor example is shown in Fig. 3. The processorconsists of state elements .The
ve state elements are: the program counter, theregister
le (an array of integer values), the instructionmemory (an array of instructions), the data memory (anarray of integer values), and the pipeline buffer (a FIFO of instructions). In an actual synthesis, the data width of the stateelements would be inferred from the type declaration given inthe source-level description.Next, we examine ATS transitions that describe the behaviorof this two-stage pipelined processor. Instruction fetching inthe fetch stage can be described by the transition.Thistransitionshouldfetchaninstructionfromthecurrentprogramlocationin andenqueuingtheinstruc-tion into . This transition should be enabled whenever isnot full. In ATS notationNotice the speci
cation of the transition is unconcernedwith what happens elsewhere in the pipeline (e.g., if an earlierbranch in the pipeline is about to be taken or if the pipeline isabout to encounter an exception).The execution of the different instruction types in the secondpipeline stage can be similarly described as separate atomictransitions. First, consider for executingan instruction, whereis enabled only when the next pending instruction inis an instruction. When applied, its actions carry out thesemantics of the instruction. Next, consider two separatetransitions andthat specify the two possibleexecutions of a branch-if-zero instructiontests for conditions when the next pending instruc-tion is a instruction and the branch condition is not met. Theresulting action of is to leave the pipeline state un-modi
ed other than to remove the executed instruction from. On the other hand, when is satis
ed, besides set-ting to the new branch target, must also simulta-neousclearthecontentof becausethe transitiongivenearlier actually follows a simple branch speculation by alwaysincrementing .Thus,ifabranchistakenlater, couldholdzero or more speculatively fetched wrong-path instructions.In this two-stage pipeline description, in the
rst stageand a transition in the second stage could become applicable atthe same time. Even though conceptually only one transition isto take place in each step, an implementation of this processordescription must carry out both fetch and execute transitions inthe same clock cycle; otherwise, the implementation does notbehave like a pipeline. Nevertheless, the implementation mustalso ensure that a concurrent execution of multiple transitionsproduces the same result as a sequentialized execution of thesame transitions in some order. In particular, consider the con-current execution of and . Both transitions up-date and . In this case, the implementation has to guar-antee that these transitions are applied in some sequential order.However, it is interesting to note that the choice of ordering de-termines how many bubbles are inserted after a taken branch,but it does not affect the processor
s ability to correctly executea program.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->