You are on page 1of 5

The Design of AMBA AHBNCI Wrapper

ZHANG Qing-li, W Ming-yan, WANG Jin-xiang, YE Yi-zheng, LAI Feng-chang


Microelectronics Center, Harbin Institute of Technology P.O.B313,No.92XiDaZhi Jie,Harbin,ISCOOl,P.R.China Email: qingIee@hit,edu.cn

Abstract Utilizing a core-centric approach, we can develop plugcoinpatible components and greatly inaxuiiue design reuse. IP blocks and buses with standard core interface can be pluggedhuectly together with little or no custom interfacing; saving an amount of design time. This paper presents the design of M A AHBNCI w p p e r which is intended for connecting VCI-based cores to the M A AH8 bus. The paper details the comparison and contrast between the M A AHB bus protocol and BVCI interface protocol. The two key t e c h q u a : quasipipelined operation and pre-fetchmg is proposed to reduce perfomiance overhead. We also take its configmbility into considerationto enhance the flexibility of application. Finally, the paper discusses the practicability of VCI usage nith an on-chip bus. Key rvords: System-on-a-Chip,IP, on-chip bus, AMBA AHB,VCI, \\Tapper.

1. Intmduction The grouing gap between the silicon gate capacity and the designer producti\;ity has lead to the significant challenge of SoC (Systnn-on-a-Chip)design and the need for new fonns of design methodologies [I]. In order to hridge this gap; design reuse - the integration of different reusable IPS (Intellectual Propeltics) to design complex SoC devices - is widely accepted as the key to achicving higher productivity to iiieet shofier time-tomarket demands. The communication architecture can sigruficantly enhance the reusable desigm methodology by pronioting the use of a consistent coinmunication interface and consequently enabling efticient integration of heterogeneous system components (e.g.,CPUs, DSPs, application specificcores, memories, custom logic, efc). Some popular on-chip bus architectures (e.g., AMBA, Coreconnect, CoreFmne, ctc) used in commercial SnC designs are typical examples of coininunication architectures. Ilo\vwer, due to the existence of various standardizcd on-chp architccturcs, the common practicc of selecting a bus-centric protocol as an IP cores native interfacc \vi11 ultinatel) luiiit the market into whch an IP core can subsequently be used or sold. Wtulst,

standardizingon a single bus architecture does not appear to be possible because of the diversity of constmints present in embedded systems, as recognized by the V i u a l Socket Interface Alliance (VSIA) [ 2 ] . The solution to maximizing an I P cores potential market s u e is to adopt a core-centtic approach, that is, selecting a P cores native well-specified core-cenbic protocol as an I interface. IP cores providing such interface are considered to he suitable for connection to any type of on-chip bus architecture by using a bus wrapper. The VCI (virmal Component Interface) standard [3] sponsored by VSIA and the OCP (Open Core Protocol) [4] promoted by CCP-IP (OCP Intef~tional Pmership) are examples of standard core interfaces that are based on core-cenbic i t h the CCP, the VCI prolocol approach. In comparison w standad focuses only on the data flow @ion of a cores communication interface, and does not consider non-dataflow signals, such as interrupts, control and test signals. In our previous work, we have adopted AMBA [ 5 ] bus architecture as the main interconnection backbone for our SOC platfonn. Based on the consideration of the case discussed above, we decide to offer the availability of the AMBA AHBNCI wrapper which dramatically reduces the design time of integrating of any standard VCI IP core into our M A - b a s e d SOC platfonn In this paper, through studying and comparatively analyzing AMBA AHB bus protocol and BVCI interface protocol, we firstly establish the mapping interrelationship between these two protocols. And on the basis of b s , the paper presents the overall framework and intemal functionalblocks paflition for M A AHBNCI wrapper. The remainder of the paper is organized as followed. The rest of Section I lntrduces the V i 1 component Interface Standard and the AMBA bus specification. Section 2 describes the connection mechanism of AHlWCl wrapper. Section 3 compares the main similarities and differences between AHB and BVCI protocol and presents the design guidelines. The internal organization of AHB/VCI wrapper is described in Section 4. Finally, wc draw soinc conclusion from this study.
A. The VCISranrmd The Mrtual Component Interface Standard qmifies a

0-7803-7889-X/03/$17.00@2003 IEEE

438

family of point-to-point communicationprotocol between vlrmal components. Three protocols cwently belong to the family: the Peripheral VCI (PVCI), the Basic VCI (EWCI) and the Advanced VCI (AVCI). The AVCI is a suporset of the BVCI which is a superset of the PVCI. The PVCI is not a split-tramaction protocol; request and response data transfers occur during a single control handshake. The BVCI, on the other hand, is a split-transaction protocol. The only constraint placed on responses by the standard i s that they arrive at the initiator in the satlie order in which the initiator generated matching requests. The AVCI is also a split-lnmaction protocol. AVCI requests may be tagged to allow request threads to be interleaved and tmnsactions reordered. In addition, the BVCI and AVCI d o w for multiple addressing modes enahlmg integrators to take advantage of memory access optimizationsand bus optimizations.

The AHB-initiator-wrapper translates VCI transactions into t h e equivalent &actions on AHB bus. The wrapper acts Like an AHB master on the AHB side as well as a VCI target on the VCI side. The AHB-target-\mapper traplate AHB hansactions into the equivalent hansactions on VCI interface. The wrapper acts like an AHB slave on the AHB side as well as a VCI initiator on the VCI side.

3.

Design Guidelines

This section presents the guidelines of our design on the basis of the requirement of the intended function of the
AHBNCI wrapper and the comparison b e m e n tbe AMBA AHB bus protocol and BVCI interface protocol. At the same time, the wnfiguable features of AHBNCI wrapper are also adequately taken into account.

B. The .+lAfB.4 Specrjicatioi The A h 5 A (AdvancedMicrocontrollerBus Architecture), which repeseots an open indus!q standad on-chip bus architecture, defies two types of bus hienrchy: NIB (Advanced Highperformance Bus) system bus and APB (Advanced Peripheml Bus) peripheral bus. The AHB supports multiple bus masters, provides hi&-handwidth operation and is consequently intended for highperfonnance, high clock frequency system modules Ihe APB, uhch apyears as a local secondary bus that is encapsulated as a AHB slave device, is optimized for minimal power consumption and reduced interface coniplexity and is consequently suitable for low bandvidth peripherals.

AMBA AHBNCI Wrapper Connection Mechanism Figure I illustrates the connection m e c h s m of AHBNCI wrapper, whch enables IP cores with VCI interface to be plugged directly into AHB. AHB/VCl \\Tapper consists of an AHELinitiator-wrapper and an AHB-target-wrapper.
2.
Initiator

VC

A. PmfocolComparison It is easy to notice the existence of the obvious differences between the AHB bus protocol and the BVCI interface protocol throngh study of the AMBA bus specification and t h e VCI standard document. Here we only discuss the main differencesin nature as follow: T k BVCI makes use of a split-transaction protocol, that is, the transfers of request and response cells Over the interface are cdmpletely separate, concurrent events. The initiator can issue as inany requests as needed, without waiting for arrival of the corresponding responses. The AHB transfer protocol is different fthe BVCI protocol above. Every AHB transfer consists of an addms/wntrol phase and a datmresponse phase The addresdcontrolphase of any AHB tmnsfer must be immediately followed by the wrresponding datmresponse phase before the cwent transfer is completed and concurrently the next transfer starts to process. Withal, it is necessary to note that the AHB supports pipelined operation based on the overlapping of the addresdcontrol phase for the current transfer and the data/respnse phase for the previous transfer. The packet concept in F e BVCI protocol corresponds to the burst wncept in the AHB protocol. But, the formats of packet that the BVCI supports are more various compared to the burst types that are defied in the AHB protocol. T h s is dm to the BVCI allowing for multiple addressing modes. Hence, besides all the bunt operations supported in the AHB protocol, the BVCI supports the packet operations with random address mode and constant address mode. In the aspect of the data formatting and alignment, the main difference lies in the fact that the BVCI

439

suppoits the v e n klexible opention related to byte enables. This ability enables the BVCI iiiore tlesible in selecting hyla bnes than the ,433, espzcially i n using p&et (or burst) operation to transfer a hatch of dale Tliereby. lhc UVCI can supi~ort various data stnlctuius (i.e. the active byte addresses iiiay he discoiitiguous) by using the relationship emong nddmss iiiid byte enable of each cell in the ]?acket opcratio~i; \\herus thc A H 3 only supports the bust operation rcquiriiig the active byte address to he contiguous.
R. :IIIR-lni,ior,x.lli'nppe,. Througli die coiiiparisoii above, wc asrive at a conclusion that the N-I13protocol can only he Uie subset of the BVCI protocol in the charactcristics of data transactions, except 1'01 the essential dityerenccs behveen both protocols. Tlierefh; it is impossihlc to ocumiplish all of the data transnctibn clinmckxistics supportd by thc BVCI through the AH13 bus. 1:or the AHR-initiator- Wrapper which tliinskitcs the DVCI protocol into the AHB protocol, it is imix)ssihleto coiival all o T tlic otomic packet oyerations into the cquivnlail bursts on the ME3 \\ithout decompsing thcsc pcickcts; und consequently soiiie iitouiic psckcts which cannot be directly iiiappd onto the hursts supportcd by thc AHD. iiiust bc decoiiipusd inlo t!\o or iiiorc shoner, packets before the u>n\'ert mapping. As a rcsult; the original intcntion of perfoniiance for the BVCI stoiidard intmfacc has beai rebated due to this handle. Thc AHR-initi;itor-\\np])er given in Uus p p e r only suppoifs Default iiiode rcleted to byte enables that is muidaton i n Uie RVCI standard. On this premise. the AMR-initiator-\\r;~~~j~r is desigiid to support almost all 01 tlic &ita qxmtioii modes needed in the BVCI. ~Ilierefore. any of thc 13VC1 packct operations is coiivcncd into oiic or iiiore u~detined-length incraiicnting hursts on theAI-II3bus.

request and response contents blend into each other and constraint with each other, Therefore, this difference certainly brings on the bottleneck of perfoniiance for the AIIEl-target-\\rapper, nhich iuainly shows up in that a data transfer cell is completed successfully after several cycles latency. The general design giudeline for WB-target-wrapper is given in the VCI interface 2.0 standard. According to this guideline, we iiiay directly use AHB target handshaking md BVCI initiator hwdsliaking to hold the request and response infomiation stable in transfer converting. Though the means has the goodness of siiiall hardware source overhead (owing to not requiring data and address storage), the distinct perfonname overhead arises. So, it is necessary to try other approach to ensure lugher perfonnancz requirement. Two kev techques are used to improve the transfer etficiency for the AHB-target-wayper: one is quasipipelined operation when witing, i.e. utilizing F F O to butfii:r the address, control and data infomiation related to \\Tit<; operation; another is pre-fetching operation when reading, i.e. utilizing the RETRY or SPLIT response in AHB protocol to prefetch read-data needed into FIFO. The% t\vo techques iire a t the cost of FIFO h8sdware sowce overhead.

D.Cw&wabilih,
The AHB/VCI \ w p p e r should he configunble enough to adapt to a range of possible VCI-based core capabilities. In particular, the configunbility for the AHBNCI nwpper presented in the pqer contains parameterized variable address and data bus widths; suliprt for iuultiple transfer uidths. allowing for rate matctung between the AHI3 and VCI; allowing for both big or little endian inode to inatch endianness differences k m m n the AHB and VCI; support for diverse processing modes for the VCI error packet transfer, setting for wait state numbers bcforc AH) RETRY response, and configuntion for the depth of FIFOS. We have adopted hvo configuntion iiiechanisnis in the design: one is the static parameter coi~tiguntion~ which must he done hefore VC is instantiated>another is tlie d y " m coilftgiuation through registers 161.

c.. . I I ~ ~ - T o , ~ ' ! . l - l l ~ ~ , , ~ ' ! ~


As iiientioii~l abovc. thc BVCI protocol is the sipmet of

thc AI~II3 protocol in tlic cliariictcristicsdata transactions. So. iii o])]n~sitioii 10 tlie Al-IB-initialor-\\.r~i~per, it is easy for the AtlR-larget-\\rapi~r to limp all of the bursts supported by tlie AH13 hus into the equivalent packets over the UVCI interface But on the other haiid; thcrc exists the essential dillercnce het\\ecn hotli protouils as discisseed in Section 3-A [I). 'I'I1c Ir"Cl-5 of rcqucst and r c s p ~ u couten1s ~ over the BVCI interim are winplctcly sepiirate iiiid coiicwent cwnts2 except that thc order of xspoiises umesponds to tlic ordcr 01' requests: oii thc contrary. the NIB protocol may also be iiiterprctcd os thc ~ i s e Uiat the transfers of

4. The architecture of AHBNCI wrapper A . The Internal O?ganizatior? of.'UfB-lniti~tor-lR.oppcr Figure 2 illustrates the fimctionll or@nization of

AI%)-initiatorwrapper, which mainly consists of five blccts: BVCI request m a c h e ; BVCI response mache; Request contcnt FI1:0, Rcsponsc content machme and AH13 iiiaster engine. The BVCI requcst iiiachme receives request content

440

U
F i g w 2: The inienlnl slructiirc ufAHB-iniiintor-\\7apper

infixmation irolii the VCI initiator, and inserts them into the Rcqucst content FIFO (rate mismatching case) or directly niorcs them into the Am3 inaster engine (rate iiiatching case). The BVCI response nlachine fetches BVCI response content iilt'onnationfrom the Rcsponx content FIFO, and drives theiii to thc VCI initiator. l h c rcqucst content FIFO is an asynchronous FIFO, which operates mI\i \\hen mtc mismatchiw heh\ecn the VCI initintor and AI-E bus and is disabled by the SyiicEnahle signal. Its function is to synchronize the BVCI rcqucst content (coinniand. addrcss mode flags, packet length. address, hstc enables; data and cnd of picket) intn the AHH clock doniain. Thc rcspnsc content FIFO is a a)ntigurahle FIFO with nchronous or asynchronous inodc, \vhich operates in nchronous mode when rate matclung hetu-een the VCI mitiator and AFTR INS. and othen\ise in osylchnmous niode. Its work mode i s set by the SyncEnahle signal. llie AI-LU inaster engine hindles all tuning related to transacting on the AHE bus. On thc other hand, it cmvens the RVC1-umpliant control signals (all BVCI rcqiiest contents except ndata) into the AI3I3-coinpliant inaster control signals, nlule conr~erting the NIBcompliant slave responsz signals into the BVCI-compliant response contents. In addition, it has responsibility for the ahnonnal transier case (ERROR. RETRY or SPLIT transfer I. Note that all infonilation is stored in the FIFO in its VCI-mmpliant tixniat. This dcsigii decision lomlies the mnple\-ilv in thc AHB iiiiisler engine.
U. 7hc hrwwol Otprrrizofirrrof:UlB-To,~~~~-lli.nppe,.

Figure 3: Tlic intend strichare ofAHB-iarget-wmpper

Figurc 3 illustratcs thc I'unctioml orpni"ion pf Al-IR-larget-\\lappcr, \diich is coniposed of various cmtrol logics. data piths, data hiltfers and registcrs. In of 1.i hlwks: Register block, AFIB s l a w rcadhnte control unit. CtrliAddr FIFO; Wdctta FIFO. Rdata FIFO. VCI initiator engine. VCI response processor. Comparator, l'ackct counter. Synchronizer.

Wdata path niux, Rdata path Inux and Intempt logic. The register block is coiiiposed of configuration register, status register and control register. Its access is through the sitnplc APB interface The AHB slave rcauwite control Unit has the control h c t i o n of NIB slave. When perfomling write transaction, it lnsefls thc addresdcontrol infonilation and \\,data of cach trnnsfer cell into Uie appropriate FIFOs; when performing read transaction, it insnis only the addredcontrol inkonnation of the first transfer cell into the ctrliaddr FIFO, \\,Me fetching read date needed fmin the r h t a FIFO and in tum driving them onto the AHE hus. The ctrl/addr FIFO, \\.data FIFO and rdnta FIFO are confgiwahle fil'os nith synchronous or asynchronous mode. The ctrliadddr FIFO is a tri-port fifo (with one \\nte-porl and hvo reidports), of \\hch one read-port is used to read directly the addresslcontrol intonnation whlc another is used to read the backup infomiation of address /control. Thc ctrl/addr FIFO and ndata FIFO B E also ealled as iwite-FIFO, which auns at infonnation bufFering. The r d m F F O aims at pre-fetching read data The VCI initiator engine converts thc ME3-compliant addres; control and data infonilation into the BVCIcoinpliant request contents and drives Uian onto VC target. For write transaction, it's need to fetch all int'oniiation of each trnnsfcr cell in the hunt from the \\rite-FIFO and w n v n i lhaii; for read pre-retching transaction, it's only necd to fetch the first transfer cell of the bursi froin the ctrUaddr F'IFO and then convert subsequent transfer cells according to predctined address ~lgorithllli. l h c VCI responsc processor receives the response wntent infonilation retumed fonn V U tar@ and perfoniis appropriate handle. For wite trnsnction, whaiever

receiving a BVCI response cell, it pops the backup of address and control of its correspondingAHE transfer cell froin the second read-port of ctrWaddr FIFO, whilst nionitoing the VCI'respnse error status For read transaction, whenever receiving a BVCI response cell, it inserts the read data and response sfatus of this cell into the rdai FIFO, and then pops the infomiation backup related to the read transaction from the second read-port of ctrWaddr FIFO at the end of pre-fetch. The conqxuator is used for the read pre-fetching operation. Through companne the address and confml information of both bursts, it judges whether the new read burst is the RETRY vmion of previous read burst. The packet counter is used to count the number of request packets that are not yet responded hy the VCI tarpet. The synchronizer is used to synchronize the internal control ,signals crossing clock boundaries. It is enabled when rate mismatching between the AHE and BVCI. The wdata path mux is used to select appropriate byte lanes on the AHE write data bus. The rdata path mux is used to select appropriate byte lanes on the BVCI read data bus. The intermpt logic is used to generate the intermpievents related to write transaction.

5.

Conclusion Refemnce [I] M.Keating and PBricaud, ' % u s e Methodology Manual", Kluwer Academic Publishers, 206Edition,

In this paper the iniplenientation scheme for the AMBA AHBNCI wapper has been presented, and whilst its
coniigurability has been also adequately taken into account. In our experiment, the RTL model of AHBNCI wapper has been established in Verilog HDL. The functional verification for the AHBNCI wrapper has been f i s h e d utilizing BFM @us function model) mothod. The simulation wave is s h o w in figure 4. Simulation results demonstnte that its functionality is compliant w i t h the AMBA specification and VCI stanhd. The RTL model has been synthesized using TSMC 0.25p1 standard cell library. The hardware scales of the MB-initiator-wrapper and AHB-target- wrapper are separately about4846 and 8590 gates (NAND2 euq.) excluding the FIFOs, and they operate at I OOMHz AHE clock and 5OMHz VCI clock. As mentioned in the paper, VSIA's VCI has a good original intention to meet the need of all SoCs, because it is independent of interconnect architecture and provides SoC integmtors with a way to use the Same IP with any on-chip buses or other interconnect archtechre However, from this experiment, we come to the conclusion that standard bus wrapper (VCI) incurs performance and a r e a oyerhead; which is less etficient to SoC integration, especially the integration of processor and critical peripherals [7]. Maybe it is mainly due to this reason that the VCI standard has been not widely coqwcialized.

1999
[2] VSI AllianceTM On-Chip Bus Attributes Specification 1 Version 2.0 (OCB 1 2.0), On-chp Bus Development Working Group, September 2001, Q Virtual Sccket Interface Alliance, //wwwvsi.org [3] 'VS1 AllianceTM Virtual Component Interface Standard Version 2.0 (OCB 2 L O ) , On-Chip Bus Development Working Group, April 2001,O Virtual Socket Interface Alliance, //wwwvsi.org Specification 1.0,. Document [4] Open Core ProtocolTM Version 002, 2001, 0 OCP-IP Association, //w.ocpip.org [5] M A T " Specification, Revision 2.0, May 1999,O ARM Ltd., //uwu.ann.com [6] Zhao Junchao, Chen Weiliang, and Wei Shaojun. Parameterized IP Core D e s i g n . Roc. Of 4" Inter. Cod. On ASIC, ( S h e , China, 2001), p.744 [7] C. Y e w G. Matthews,J. Moms, A. Haverinen, and J. Zaidi. Standard Bus vs. Bus Wrapper: What is the Best Solution for Future SoC Integration? h o c . Of Cod. On DATE, (Munich, Germany, ZOOI), p.776

442

You might also like