Performance Comparison of AMBA Bus-Based System-On

2011 International Conference on Communication Systems and Network Technologies
Performance Comparison of AMBA Bus-Based System-On-Chip Communication

Protocol
Anurag Shrivastava G.S. Tomar

Dept. of Electronics & Comm., Machine Intelligence Research Ashutosh Kumar Singh
Digitech Solution ,
Priyatam institute of Technology & (MIR) Labs, Gwalior 474011 Operation Manager,
Management, Indore, India Inida Indore,India,
shrivastavaanurag@rediffmail.com gstomar@ieee.org akzhse@gmail.com
Abstract—Modern computer system rely more and more on – • minimize silicon infrastructure while supporting high
chip communication protocol to exchange data. System-on-chip performance and low power on-chip communication.
(SoC) designs use bus protocols for high performance data
transfer among the Intellectual Propert (IP) cores. These
The Advanced Microcontroller Bus Architecture
protocols incorporate advanced features such as pipelining,burst
and split transfers. In this paper, we describe a case study for (AMBA) was introduced by ARM in 1996 and is widely
different AMBA SOC bus protocol and their performance used as the on-chip bus in SoC designs. AMBA is a
comparison: the Advanced Micro-controller Bus Architecture registered trademark of ARM. The first AMBA buses were
(AMBA) protocol from ARM. It starts ith a brief introduction Advanced System Bus (ASB) and Advanced Peripheral Bus
AMBA 2.0 protocol, AMBA 3.0 protocol and AMBA 4.0 (APB). In its 2nd version, AMBA 2, ARM[2] added AMBA
protocol, and concludes with a discussion related to a High-performance Bus (AHB) that is a single clock-edge
comparison performance analysis of all three AMBA bus protocol. In 2003, ARM introduced the 3rd generation,
protocol. AMBA3[3], including AXI to reach even higher
performance interconnect and the Advanced Trace Bus
Keywords- SOC, On-chip Communication ,AMBA, Bus
Protocol (ATB) as part of the Core Sight on-chip debug and trace
solution. In 2010, ARM introduced the 4th generation,
AMBA 4,[1] including AMBA 4 AXI4, AXI4-Lite, and
I. INTRODUCTION
AXI4-Stream Protocol,The AMBA 4.0 protocol defines
On-chip communication architectures can have a great five buses/interfaces:
influence on the speed and area of System-on-Chip • Advanced eXtensible Interface (AXI)
Designs. Modern computer system[14] rely more and more • Advanced High-performance Bus (AHB)
on higly complex on–chip communication protocol to
• Advanced System Bus (ASB)
exchange data. The enormous complexity of these protocol
results from tackling high-performance • Advanced Peripheral Bus (APB)
requirements.protocol control can be distributed,and there • Advanced Trace Bus (ATB)
may be non-atomicity or speculation. The electronics The third generation of AMBA defined in the AMBA 3
industry has entered the era of multi-million-gate chips, and protocol specification[7], is targeted at high performance,
thereXs no turning back. This technology promises new high clock frequency system designs and includes features
levels of integration on a single chip, called the System-on-a- which make it very suitable for high speed sub-micrometer
Chip (SoC) design, but also presents significant challenges to interconnect:
the chip designer. Processing cores on a single chip, may • separate address/control and data phases
number well into the high tens within the next decade, given • support for unaligned data transfers using byte
the current rate of advancements [1]. The important aspect of strobes
a SoC is not only which components or blocks it houses, but • burst based transactions with only start address
also how they are interconnected. AMBA is a solution for
issued
the blocks to interface with each other. The oblective of the
AMBA specification [15] is to: • issuing of multiple outstanding addresses
• facilitate right-first-time development of embedded A simple transaction on the AHB consists of an address
microcontroller products with one or more CPUs, GPUs phase and a subsequent data phase (without wait states: only
or signal processors, two bus-cycles). Access to the target device is controlled
• be technology independent, to allow reuse of IP cores, through a Mux(non-tristate), thereby admitting bus-access to
peripheral and system macrocells across diverse IC one bus-master at a time. AMBA 4.0 protocol[12] includes
processes, encourage modular system design to definition of an expanded family interconnect protocols
improve processor independence, and the development including AXI4, AXI4-Lite and AXI4-Stream. The AXI4
of reusable peripheral and system IP libraries protocol adds support for longer bursts and Quality of
978-0-7695-4437-3/11 $26.00 © 2011 IEEE 449

DOI 10.1109/CSNT.2011.98
Service (QoS) signaling, and is a natural extension of the III. AMBA 3.0 PROTOCOL
AMBA 3 specification. Long burst support aids integration
of devices with large block transfers, while QoS signaling Wherever The key goals of the AMBA 3 AXI protocol is
provides the ability to manage latency and bandwidth in interoperability[15] with the existing AMBA technology.
complex multi-master systems. The new AMBA 4 The AMBA 3 AXI protocol is targeted at high-performance,
specification and the AXI4 protocols have also been high-frequency system designs and includes a number of
extended to meet the needs of FPGA implementations. features that make it suitable for a high-speed, submicron
AXI4-Lite is a subset of the full AXI4 specification for interconnect.
simple control register interfaces, reducing SoC wiring AMBA AXI is a fully PP connection.[4] It decouples
congestion and simplifying implementation, while the masters and slaves from the underlying interconnect, by
AXI4-Stream protocol provides an efficient streaming defining only master interfaces and symmetric slave
interface for non address-based, point-to-point interfaces. This approach, besides allowing backward
communication, such as video and audio data. compatibility and interconnect topology independence, has
the advantage of simplifying the handshake logic of
II. AMBA 2.0 PROTOCOL attached devices, which only need to manage a point-to-
point link. To provide higher parallelism, four different
logical Mono directional channels are provided in AXI
The Advanced Microcontroller Bus Architecture (AMBA) interfaces;[13] an address channel, a read channel, a write
specification, developed by ARM[2], defines an on-chip channel and a write response channel. Activity on different
communications[2] standard for designing high performance channels is mostly asynchronous (e.g. data for a write can
embedded microcontrollers.[15] Three distinct buses are be pushed to the write channel before or after the write
defined within the AMBA specification: address is issued to the address channel), and can be
parallelized, allowing multiple outstanding read and write
Advanced High-performance Bus (AHB)- The AMBA requests, with out-of-order completion. Figure 2(a) shows
AHB is for high-performance, high-clock-frequency system how a read transaction uses the read address and read data
modules. channels[4]. The write operation over the write address and
Advanced System Bus (ASB)- The AMBA ASB is for write data channels is presented in Fig 2b. Data is
high-performance system modules. where the high- transferred from the master to the slave using a write data
performance features of AHB are not required. channel, and it is transferred from the slave to the master
Advanced Peripheral Bus (APB)-AMBA APB is using a read data channel. In write transactions, where all
the data flows from the master to the slave,The AXI
optimised for minimal power consumption and reduced
protocol [5] has an additional write response channel to
interface complexity to support peripheral functions. APB
allow the slave to signal to the master the completion of the
can be used in conjunction with either version of the system
write transaction. The rationale of this split-channel
bus.
implementation is based upon the observation that usually
the required bandwidth for addresses is much lower than
that for data (e.g. a burst requires a single address but maybe
four or eight data transfers). Thus, it might be possible to
allocate more interconnect bandwidth to the data bus than
the address bus [9].
Features of the AMBA 3 AXI protocol :
• Suitability for high-bandwidth and low-latency
designs
• To enable high-frequency operation without using
complex bridges
• Meet the interface requirements of a wide range of
components
• Suitability for memory controllers with high initial
access latency Provide flexibility in the
implementation of interconnect architectures.
• System level cache support
Figure 1. AMBA 2.0 Protocol • Separate address/control and data phases
• Support for unaligned data transfers using byte
strobes
• Burst-based transactions with only start address
issued
450
• Separate read and write data channels to enable completion of the write transaction in fig3. The AXI4
low-cost direct memory access (DMA) protocol[12] enables:
• Ability to issue multiple outstanding addresses • Address information to be issued ahead of the actual data
• Out-of-order transaction completion transfer
• Easy addition of register stages to provide timing • Support for multiple outstanding transactions
closure • Support for out-of-order completion of transactions.
• Protocol includes optional extensions that cover
signaling for low-power operation
Figure 3. AXI 4
The AXI protocol provides a single interface definition for

describing interfaces:
• between a master and the interconnect
• between a slave and the interconnect
• between a master and a slave.
The interface definition enables a variety of different
interconnect implementations. The interconnect between
figure 2. (a) Read ( b)Write devices is equivalent to another device with symmetrical
master and slave ports to which real master and slave
IV. AMBA 4.0 PROTOCOL
devices can be connected. Most systems use one of three
interconnect approaches:
ARM’s series of AMBA specifications have become a • shared address and data buses
de facto standard for SoC (system-on-chip) interconnects. • shared address buses and multiple data buses
The latest version of the specification, called AMBA 4. The • multilayer, with multiple address and data buses.
AMBA 4 specification[15] is actually a collection of
specifications (including AXI4, AXI4-Lite, and AXI4- A.Read and write address channels
Streaming) that may well revolutionize the future of high- Read and write transactions each have their own address
performance SoC interconnects. The new AMBA 4 channel. The AXI protocol supports the following
protocols extend the AMBA 3 protocol to improve mechanisms:
interconnect performance and quality of service. The AXI4- • variable-length bursts, from 1 to 16 data transfers per burst
Lite specification is a simplified implementation of the • bursts with a transfer size of 8-1024 bits
specification that’s designed to meet the needs of FPGA- • wrapping, incrementing, and non-incrementing bursts
based designs through simple control-register interfaces and • atomic operations, using exclusive or locked accesses
reduced wiring congestion. AMBA 4 also supports non- • system-level caching and buffering control
address-based, point-to-point communication through the • secure and privileged access
new AXI4-Stream protocol. The new specification directly B.Register slices
addresses the needs of high-performance, low-latency, and Each AXI channel[12] transfers information in only one
low-power chip designs.[8] The AMBA AXI Protocol direction, and there is no requirement for a fixed
Specification Version 1.0 describes the AXI Interface (AXI3 relationship between the various channels. This is important
protocol) And, Version 2.0 detailing AXI4 and AXI4-Lite because it enables the insertion of a register slice in any
4.1 AXI4 Protocol channel, at the cost of an additional cycle of latency. This
The AXI protocol has an additional write response makes possible a trade-off between cycles of latency and
channel to allow the slave to signal to the master the maximum frequency of operation. It is also possible to use
register slices at almost any point within a given
interconnect. It can be advantageous to use a direct
451
C. Transaction ordering V. PERFORMANCE COMPARISON OF AMBA PROTOCOL
The AXI protocol enables out-of-order transaction 5.1 Features Comparison:
completion. It gives an ID tag to every transaction across the
interface. The protocol requires that transactions with the AMBA 2 AHB Protocol AMBA 3 AXI Protocol
same ID tag are completed in order, but transactions with Fixed pipeline for address Five independent channels for addr/data and
different ID tags can be completed out of order. Out-of- and data transfers response
order transactions can improve system performance in two Bidirectional link with
Each channel is unidirectional except for
ways: complex timing
single handshake for return path
relationships
• The interconnect can enable transactions with fast
responding slaves to complete in advance of earlier Hard to isolate timing Register slices isolate timing
transactions with slower slaves. Limits frequency of
Frequency scales with pipelining
• Complex slaves can return read data out of order. For operation
example, a data item for a later access might be available Inefficient asynchronous
High performance asynch bridging
from an internal buffer before the data for an earlier access bridges
is Separate address for every
Burst based--one address per burst
Available. data item
Only one transaction at a
4.2 AXI4- Lite Protocol time
Multiple outstanding transactions
AXI4-Lite addresses[12] by defining certain restrictions Fixed pipeline for address

Out of order data
and data
that would allow a slave to be connected directly to an AXI
fabric. In AXI4-Lite, you might say that AXI gets "dumbed" Only one transaction at a
Simultaneous reads and writes
time
down to a few basic transaction types. The burst length is
fixed to one data transfer, transfers are non-cacheable and
non-bufferable, exclusive access is not allowed and access Table 2. Comparison of AMBA Protocol Type
width must always be the same as data bus width.
APB AHB AXI3/AXI4
. The key features of the AXI4-Lite interface are:
• All transactions are burst length of 1 Processors all ARM7,9,10 ARM11
• All data accesses are the same size as the width of the data
bus Control Signals 4 27 77
• support for data bus width of 32-bit or 64-bit
• exclusive accesses are not supported. No. of Masters 1 1-15 1-16
4.3 AXI4-Stream Protocol
No. of Slaves 1-15 1-15 1-16
The new AXI4-Stream protocol [11]was designed for
streaming data to destinations that are not memory mapped
internally. Display controllers, transmitters, but also routing Interconnect Central Central Crossbar
fabrics are among the target applications for this new Type MUX MUX w/ 5 channels
protocol. Building upon the proven simple AXI channel
Phases Setup, Bus request, Address,
handshake AXI4-Stream is essentially an AXI write data Enable Address, Data,Responce
channel with additional control signals and a slightly Data
modified protocol. The burst (packet) length is not restricted
and the number of bytes of tge data sugnal TDATA can be Xact. Depth 1 2 16
an arbitrary integer including zero.
Burst Lengths 1 1-32 1-16
Features
The AXI4 update to AXI3 includes the following:
Simultaneous no no Yes
• Support for burst lengths up to 256 beats Read & Write
• Quality of Service (QoS) signaling
• Support for multiple region interfaces
• updated write response requirements Table 3. Comparison of AMBA Protocol Type
• updated AWCACHE and ARCACHE signaling details
• Additional information on Ordering requirements
• Details of optional User signaling
• Removal of locked transactions
• Removal of write interleaving
452
5.2 Protocol Efficiency
Figure 6: AXI 3
Figure 4. Cycles for the AMBA 2 AHB Protocol
Figure 5 shows the increased efficiency[7] of the AMBA 3

Only one address is needed for a burst, data can be
transmitted decoupled from the address and even out of Figure 7: AXI 4
order, and data associated with different addresses can be
interleaved. 5.4 Scalability properties
Scalability properties of the system interconnects can be
observed in Fig. 8, reporting the execution time variation[4]
resulting execution times, as Fig. 8 shows, get 77% worse
for AMBA AHB when moving from two to eight cores,
while AXI manages to stay within 12% and 15%. The
reason behind the behaviour pointed out in Fig. 8 is that
under heavy traffic load and with many processors,
interconnect saturation takes place. In such a congested
environment, as Fig. 9 shows, AMBA AXI can achieve
transfer efficiencies of up to 81%, while AMBA AHB
reaches 47% only; near to its maximum theoretical
efficiency of 50% (one wait state per data word). These
Figure 5. Cycles for the AMBA 3 AXI Protocol plots stress the impact that comparatively low-areaoverhead
optimisations can sometimes have in complex system
Figures 4 and 5 show the bus cycles for the AMBA 2 AHB
and AMBA 3 AXI protocols, respectively.[7] In Figure 1, a
burst with multiple data requires multiple addresses, all data
must be transmitted in the same order as the addresses, and
multiple data must be contiguous. These restrictions reflect
the requirements of the AMBA 2 AHB specification.
5.3 Dependencies between Channel Handshake Signals

Dependencies To prevent a deadlock situation[12], the
dependencies that exist between the handshake signals. In
any transaction: The VALID signal of one AXI component
must not be dependent on the READY signal of the other
component in the transaction.The READY signal can wait
for assertion of the VALID signal The AXI3 protocol
requires that the write response for all transactions must not
be given until the clock cycle after the acceptance of the last
data transfer In addition, the AXI4 protocol requires that Figure 8: AMBA AHB
the write response for all transactions must not be given
until the clock cycle after address acceptance
453
requires that the write response for all transactions must not
be given until the clock cycle after the acceptance of the last
data transfer. In addition, the AXI4 protocol requires that
the write response for all transactions must not be given
until the clock cycle after address acceptance
8. Ttransactionwrite interleaving support in AXI4.
9.If the AXI is a 64 bit bus running at 200 MHz, then the
AHB will be a 128 bit bus running at 400 MHz. The burst
sizes will be: small (16 Bytes), medium (32 Bytes), and
large (64 Bytes).
10.AXI3 protocol:• Early termination of bursts it not
supported.A burst must not cross a 4-kbyte boundary.AXI4
protocol longer burst support
REFERENCES
.
Figure 9. AMBA AXI [1] ARM, “AMBA Specification Overview”, available at
http://www.arm.com/.
[2] ARM, “AMBA Specification (Rev 2.0)”, available at
http://www.arm.com.
VI. USING THE TEMPLATE [3] ARM, “AMBA AXI Protocol Specification”, available at
http://www.arm.com.
1. AXI4 address width -32 bits with 16 master and 16 slave [4]. Benini and D. Bertozzi, “Network-on-chip architectures and design
device methods,” IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005
2.Write interleaving. This feature was retracted by AXI4 [5]Chien-Hung Chen, Jiun-Cheng Ju, and Ing-Jer Huang, “A
Synthesizable AXI Protocol Checker for SoC Integration,”IEEE,,ISOCC
protocol. AXI3 master devices must be configured as if 2010.
connected to a slave with Write interleaving depth of one. [6]Ing. Dries Driessens and. Tom Tierens, “Overview Embedded Buses,”
3.AXI4 QoS signals do not influence arbitration priority. 2003 by Patrick Pelgrims.
QoS signals are propagated from masters to slaves. Interface [7]Mick Posner, Darrin Mossor ,”Designing Using the AMBA (TM) 3
AXI (TM) Protocol -- Easing the Design Challenges and Putting the
data widths: AXI4: 32, 64, 128, 256, 512, or 1024 bits an Verification Task on a Fast Track to Success” from Synopsys.
AXI4-Lite: 32 bits [8]Denali Memory Blog Steve Leibson ,”New White Paper discusses the
4. AXI4 address width -32 bits with 16 master and 16 slave challenges of chip design based on AMBA 4”From 2010, Cadence.
device. [9]Marcus Harnisch ,”Migrating from AHB to AXI based SoC
Designs”,from Doulos, 2010
5.The AMBA 4 protocol new features such as 256-beat [10] Krishna Sekar, Kanishka Lahiri,, Anand Raghunathan ,”Dynamically
burst support for improved performance, Quality of Service Configurable Bus Topologies for High-Performance On-Chip
(QoS) signaling for multi-master support, the AXI4-Stream Communication”, IEEE TRANSACTIONS ON VERY LARGE SCALE
protocol for non-address-based, point-to-point INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 10, OCTOBER 2008.
[11] Sangkwon Na*, Sung Yang**, and Chong-Min Kyung,”Low-Power
communication among masters and slaves (especially useful Bus Architecture Composition for AMBA AXI ,JOURNAL OF
for data streams commonly used for moving video and SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.9,2009.
audio on an SoC), multiple region interfaces. [12]AMBA® 4 AXI4™, AXI4-Lite™, andAXI4-Stream™ Protocol
6. The AXI4 protocol supports an ordering model based on Assertions user Guide 2010 ARM.
[13]NY Liao,”Analysis of shared-link AXI” IET Comput. Digit. Tech.,
the use of the AXI ID transaction 7.The AXI3 protocol Vol. 3, Iss. 4, pp. 373–383, 2009.
454

Performance Comparison of AMBA Bus-Based System-On

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Comparison of AMBA Bus-Based System-On

Uploaded by

Copyright:

Available Formats

2011 International Conference on Communication Systems and Network Technologies

Performance Comparison of AMBA Bus-Based System-On-Chip Communication

Anurag Shrivastava G.S. Tomar

978-0-7695-4437-3/11 $26.00 © 2011 IEEE 449

The AXI protocol provides a single interface definition for

AXI4-Lite addresses[12] by defining certain restrictions Fixed pipeline for address

Figure 4. Cycles for the AMBA 2 AHB Protocol

Figure 5 shows the increased efficiency[7] of the AMBA 3

5.3 Dependencies between Channel Handshake Signals

You might also like