You are on page 1of 8

Dorbala 1

Rohini Chandra Dorbala

August 12, 2010
Lighting Data Transport Technology

Table of Contents
Lighting Data Transport Technology.........................................................................................................1
Front Side Bus............................................................................................................................................1
HyperTransportTM (Lightning Data Transport)........................................................................................3


This paper explores the point-to-point link technologies for interconnecting memory, I/O and fast

microprocessors. We begin with the earlier interconnect bus, Front Side Bus (FSB), and then we see the

need for something extra to cater the speeds of the processors. Then, we introduce AMD's

HyperTransportTM technology. These technologies offers competing performance benefits to desktop,

server and supercomputing applications. While there are differences in the way they offer these

benefits, there exist some notable benefits of one over the other in some specific applications.

Front Side Bus

Until recently the personal computer boards had two important components named North Bridge and

South Bridge. The North Bridge also known as memory controller hub which connects the memory

(RAM) and graphics (AGP/PCIe) to the processor, where as the South Bridge also known as I/O

controller hub which connects the slower I/O devices to the North Bridge (Futcher i). Southbridge being

used to communicate with slower devices like USB, IDE, SATA, Ethernet, Audio Controller, CMOS

memory, is connected to the Northbridge which is faster and used as a communication controller for
Dorbala 2

AGP, PCI express and memory bus. Northbridge is connected to the microprocessor through Front-Side

Bus (FSB) which is first designed by Intel. The illustration 1 shows this layout. FSB carries the data

between the Northbridge and CPU.

Intel developed FSB not just to interconnect rest of the

devices on the board through Northbridge to CPU, but

to provide effective multiprocessing support cheaper

than Sun, IBM and other UNIX server providers.

There is no theoretical limit on the number of

processors that can be put on FSB, but the

performance does not scale linearly with the number

of processors added to the FSB due to the limitation of

the architecture's bandwidth. With the current fast

processors, slower FSB can become a bottleneck while

the processor remain waiting for the data from FSB

for a clock cycle or more. Though the FSB can be

made faster, the slower devices from Northbridge and

Southbridge make the FSB wait for some clock cycles Illustration 1: Typical Chipset Layout using
Front Side Busi
before they are ready with data. FSB's fastest transfer

speed is currently 1.6 giga-transfers per second (GT/s) (Futcher i). As the memory (RAM) is also

accessed through FSB, memory accesses are also limited to the speed of the FSB. To reduce the load on

FSB Intel used huge cache memory at L1, L2 and introduced L3 (up to 24 MB for Itanium 2 processor)

(Kanterii) for some processors and still could not eliminate the bottleneck caused by slow FSB. The

answer is the modern point-to-point interconnect technologies developed by AMD and Intel, ironically
Dorbala 3

with the help of Alpha engineers. The point-to-point interconnect technology roots from Alpha

processors of DEC, UNIX servers from IBM, and SUN. We shall compare these two technologies in

this paper.

HyperTransportTM (Lightning Data Transport)

HyperTransportTM is the combined effort of AMD, Alpha processors and API Networks to simplify and

integrate high-speed data traffic between high-speed processors, memory and I/O. HyperTransport TM

has evolved through specification 1.03 offering 12.8 Gigabytes per second aggregate bandwidth

operating at maximum clock speed of 800 MHz, to specification 2.0 offering 22.4 Gigabytes per

second aggregate bandwidth operating at maximum clock speed of 1.4 GHz, to specification 3.0

offering 41.6 Gigabytes per second aggregate bandwidth operating at maximum clock speed of 2.6

GHz, to specification 3.1 offering 51.2 Gigabytes per second aggregate bandwidth operating at

maximum clock speed of 3.2 GHz. These specifications define a practical, high performance, ideally

suitable for applications ranging from consumer, embedded systems, personal computers, portable

computers, servers, network equipment, and even supercomputers.

HyperTransportTM is designed keeping Peripheral Component Interconnect (PCI) compatibility in mind.

This compatibility led all HyperTransport TM devices appear to be PCI devices and conform to the

properties of PCI standard easing wide spread adoption of HyperTransport TM technologies throughout

the industry. HyperTransportTM technology uses enhanced 1.2 Volt low voltage differential signaling

(LVDS) for developing physical electrical link. The LVDS technology reduces the system power

consumption, reduces noise interference, simplifies printed circuit board manufacture and thus lowers

the system cost. The technology uses a low cost point-to-point link backbone structure iii interconnecting
Dorbala 4

system's core components (Processor, Memory and I/O elements). As an optimized architecture,

HyperTransportTM provides lowest possible latency, harmonizes interfaces, reduces software overhead,

enables the intermix of load/store traffic with packet-bus traffic and supports scalable performance.

HyperTransportTM functions as a fully integrated front-side bus and eliminates the North Bridge - South

Bridge structure in the case of AMD Opteron and Athlon64 64-bit x86 processors, Transmeta's Efficeon

x86 processor, Broadcom's BCM 1250 64-bit MIPS processor, PMC-Sierra's RM9000 64-bit MIPS

processor family. In Apple's G5 PowerMac HyperTransport TM is used as an integrated, high

performance I/O bus that pipes PCI, PCI-X, USB, Firewire, and audio/video links through the system.

HyperTransportTM technology is a combination of channel link topology, signal electrical signal

interface characteristics and data organization and transfer by command/address/data packet protocols.

It uses dual, point-to-point unidirectional LVDS data links, one for input and one for output which

carry the load/store data and communication packet data in HyperTransport TM packets and stream

channels. A HyperTransportTM host (HyperTransportTM enabled CPU) and one tunnel make up a

HyperTransportTM Link. The tunnel enables the HyperTransportTM Link to be passed from one

HyperTransportTM enabled device to another. HyperTransportTM enabled devices are configured to be

Single-Link Endpoint (Cave), Dual-link Daisy Chain (Tunnel), Multiple Daisy Chains with bridge to

other I/O protocols such as PCI, PCI-X, PCI Express, or AGP (Bridge), or Daisy chain without tunnel

(Bridge). The host is always considered the top of the link and traffic from the host is downstream

while traffic to the host is upstream. Each point-to-point unidirectional link includes a data path which

is 2, 4, 8, 16, or 32 bits wide, a clock line per each 8-bit data path, and a control line. Commands,

addresses and data are carried in packets over the data path eliminating sideband control signals.

System level control lines for RESET, PWROK and optional LDTSTOP and LDTREQ control lines

with power management functions complete the signal lines. LDTSTOP can be used to put the
Dorbala 5

HyperTransportTM link into a virtual zero power consumption state.

1.2V LVDS signal lines are implemented by means of twin wire lines called balanced or differential

line carrying electrical signals that are equal in amplitude and timing but with opposite polarity. This

balance line prevents electrical noise within the system from affecting the signal detection process at

the receiver end. The noise would affect both the signals in equal measure and nullify the effect thus

ensuring a high degree of architectural noise immunity as well as a maximized transmission range. The

disadvantage of two wires per line is that it requires a second printed circuit board (PCB) trace per each

data pin/pair. Since HyperTransportTM protocol uses packet based traffic, the number of total signal

lines required for a given bandwidth is greatly reduced. As the speeds boost up, it employs simple

signal de-emphasis scheme which uses 1 bit history to de-emphasize the differential amplitude

generated by the transmitter when transmitting a continuous run of 1's and 0's. The eye of the signal

uses reduced amplitude for sequential bits of the same value. This requires that the receiver with higher


The HyperTransportTM data transport mechanism is efficient with the least overhead of any modern I/O

interconnect architecture. Command information is carried as a control packet of 4 or 8 bytes where as

data traffic is carried as a data packet that consists of an 8-byte header with write control packet, or a

header with 4-byte and a 8-byte read control packet, followed by 4-64 byte data payload. All

HyperTransportTM information is carried in the multiples of four bytes (32-bits). This packet is

distinguished as a control or data packet using the single control line (ASSERT = Control, DE-

ASSERT = Data). This method of differentiating the packet type is significant feature of the link as it

can be used to insert control packets in the middle of a long data packet. This is Priority Request
Dorbala 6

InterleavingTM (PRI) feature is unique to HyperTransport TM technology contributing to very low latency

characteristics of the HyperTransportTM by allowing a new request to be initiated in the middle of a data

packet. Also, the commands and data are categorized into one of the three types of virtual channels:

non-posted requests, posted requests and responses. Non-posted requests require a response from the

receiver like read requests and some specific write requests. Posted requests do not require a response

from the receiver like write requests. Replies to non-posted requests are Responses like read responses

or target done responses to non-posted writes. HyperTransportTM uses minimal set of data and control

lines, a straight forward packet format and provides a powerful high bandwidth for both standard

computing-oriented applications and communications-oriented packet stream applications.

HyperTransportTM is different from other chip-to-chip slot-based architecture communications

technologies by focusing on creating a unified chip-to-chip communications channel that exhibits the

lowest possible latency and overhead in supporting packet-based data streams. HyperTransport TM

exhibits low latency by adopting parallel nature in link structure. A single forwarded clock used per set

of 8 data path bits enable a very low latency point-to-point data transfer instead of adding extensive

clock encoding/decoding at both ends of the link as done by RapidIO and PCI Express. Also the low

packet over head compared to PCI Express (8-byte header for HyperTransport TM compared to 20/24

bytes for even a small data payload on PCI Express) favors HyperTransport TM. Also PRI enables a high

priority request 8-byte command to be inserted within a potentially long, lower priority data transfer

which greatly reduces the latency of HyperTransport-based systems.

HyperTransportTM DirectPacketTM provides powerful communications protocols that enable

HyperTransportTM links to carry user packet data efficiently. Computer-oriented data transfers use a
Dorbala 7

load/store metaphor that require the communication link to instruct each attached device precisely

where to store or retrieve the data in system memory. Communication technologies use a channel

metaphor to specify source and destination addresses to pass data to the channel in packets containing

control and data information, instructing the receiver or transmitter where the data streams are to be

stored. The link is responsible for providing the source/destination, control information and data

payload, and does not have to specify exact memory locations or be concerned at all with the memory

storage management. DirectPacketTM protocol is neutral to the system architecture that handles packet

data. User packets are delivered by the protocol using unused bits in the base HyperTransport TM packet

format, so that there is no overhead for supporting user packers. HyperTransport TM defines just the level

of protocol required to move user packet from point A to point B and leaves the rest to the system

architecture to the OEM to implement without over-burdening with several layers of protocol

i Northbridge, Southbridge and Front Side Bus David Futcher


<>, <


ii The Common System Interface: Intel's Future Interconnect David Kanter August 28, 2007.


iii HyperTransportTM Consortium <>