You are on page 1of 54

Lecture 15: Busses and Networking (1

Prof. Jan Rabaey Computer Science 252, Spring 2000
Based on slides from Dave Patterson, John Kubiatowicz Bill Dally, and Sonics, Inc
JR.S00 1

A Communication-Centric World
• Computation is getting distributed …
– Internet, WAN, LAN, BodyLAN, Home Networks, Microprocessor Peripherals, Processor-Memory Interface, System-on-a-Chip

• Efficient Networking and Communication is Crucial • The System-on-a-Chip implies the Networkon-a-Chip • In Next Set of Lectures:
– Busses and Networks – But more importantly, the impact of integration
JR.S00 2

What is a bus?
A Bus Is: • shared communication link • single set of wires used to connect multiple subsystems
Processor Input Control Memory Datapath Output

• A Bus is also a fundamental tool for composing large, complex systems
– systematic means of abstraction
JR.S00 3


JR.S00 4

Advantages of Buses


I/O Device

I/O Device

I/O Device


• Versatility:
– New devices can be added easily – Peripherals can be moved between computer systems that use the same bus standard

• Low Cost:
– A single set of wires is shared in multiple ways

JR.S00 5

Disadvantage of Buses Processor I/O Device I/O Device I/O Device Memory • It creates a communication bottleneck – The bandwidth of that bus can limit the maximum I/O throughput • The maximum bus speed is largely limited by: – The length of the bus – The number of devices on the bus – The need to support a range of devices with: » Widely varying latencies » Widely varying data transfer rates JR.S00 6 .

S00 7 .General Organization of a Bus Control Lines Data Lines • Control lines: – Signal requests and acknowledgments – Indicate what type of information is on the data lines • Data lines carry information between the source and the destination: – Data and Addresses – Complex commands JR.

Master versus Slave Master issues command Bus Master Data can go either way Bus Slave • A bus transaction includes two parts: – Issuing the command (and address) – Transferring the data – issuing the command (and address) – request – action • Master is the one who starts the bus transaction by: • Slave is the one who responds to the address by: – Sending data to the master if the master ask for data – Receiving data from the master if the master wants to send data JR.S00 8 .

S00 9 . and I/O devices to coexist – Cost advantage: one bus for all components JR.• Types of Busses Processor-Memory Bus (design specific) – Short and high speed – Only need to match the memory system » Maximize memory-to-processor bandwidth – Connects directly to the processor – Optimized for cache block transfers • I/O Bus (industry standard) – Usually is lengthy and slower – Need to match a wide range of I/O devices – Connects to the processor-memory bus or backplane bus • Backplane Bus (standard or proprietary) – Backplane: an interconnection structure within the chassis – Allow processors. memory.

Example: Pentium System Organization Processor/Memory Bus PCI Bus I/O Busses JR.S00 10 .

AT JR.A Computer System with One Bus: Backplane Bus Bus Backplane Processor Memory I/O Devices • A single bus (the backplane bus) is used for: – Processor to memory communication – Communication between I/O devices and memory • Advantages: Simple and low cost • Disadvantages: slow and the bus can become a major bottleneck • Example: IBM PC .S00 11 .

and a few selected I/O devices – SCCI Bus: the rest of the I/O devices JR.S00 12 .Processor A Two-Bus Processor Memory System Bus Bus Adaptor I/O Bus Bus Adaptor I/O Bus Bus Adaptor I/O Bus Memory • I/O buses tap into the processor-memory bus via bus adaptors: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices • Apple Macintosh-II – NuBus: Processor. memory.

S00 13 .Processor A Three-Bus Processor System Memory Bus Bus Adaptor Bus Adaptor I/O Bus I/O Bus Memory Backplane Bus Bus Adaptor • A small number of backplane buses tap into the processor-memory bus – Processor-memory bus is only used for processor-memory traffic – I/O buses are connected to the backplane bus • Advantage: loading on the processor bus is greatly reduced JR.

North/South Bridge architectures: separate busses Processor Memory Bus Processor “backside cache” Bus Adaptor Backplane Bus Bus Adaptor I/O Bus I/O Bus Memory • Separate sets of pins for different functions – – – – Memory bus Caches Graphics bus (for fast frame buffer) I/O busses are connected to the backplane bus • Advantage: – Busses can run at different speeds – Much less overall loading! JR.S00 14 .

S00 15 .What defines a bus? Transaction Protocol Timing and Signaling Specification Bunch of Wires Electrical Specification Physical / Mechanical Characteristics – the connectors JR.

S00 16 . they cannot be long if they are fast • Asynchronous Bus: – It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol JR.Synchronous and Asynchronous Bus • Synchronous Bus: – Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages: » Every device on the bus must run at the same clock rate » To avoid clock skew.

Busses so far Master Slave °°° Control Lines Address Lines Data Lines Bus Master: has ability to control the bus. JR. Synchronous Bus Transfers: sequence relative to common clock. ack) serve to orchestrate sequencing. initiates transaction Bus Slave: module activated by the transaction Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information. Asynchronous Bus Transfers: control lines (req.S00 17 .

S00 18 .Bus Transaction • Arbitration: Who gets the bus • Request: What do we want to do • Action: What happens in response JR.

Arbitration: Obtaining Access to the Bus Control: Master initiates requests Bus Master Data can go either way Bus Slave • One of the most important issues in bus design: – How is the bus reserved by a device that wishes to use it? • Chaos is avoided by a master-slave arrangement: – Only the bus master can control access to the bus: It initiates and controls all bus requests – A slave responds to read and write requests • The simplest system: – Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction JR.S00 19 .

• Multiple Potential Bus Masters: the Need for Arbitration Bus arbitration scheme: – A bus master wanting to use the bus asserts the bus request – A bus master cannot use the bus until its request is granted – A bus master must signal to the arbiter the end of the bus utilization • Bus arbitration schemes usually try to balance two factors: – Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus • Bus arbitration schemes can be divided into four broad classes: – Daisy chain arbitration – Centralized. – Distributed arbitration by collision detection: Each device just “goes for it”. Problems found after the fact. JR. parallel arbitration – Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus.S00 20 .

S00 21 .The Daisy Chain Bus Arbitrations Device 1 Highest Scheme Device 2 Priority Grant Bus Arbiter Grant Grant Release Request Device N Lowest Priority wired-OR • Advantage: simple • Disadvantages: – Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed JR.

S00 22 .Centralized Parallel Arbitration Device 1 Device 2 Device N Grant Bus Arbiter Req • Used in essentially all processor-memory busses and in high-speed I/O busses JR.

S00 23 .Simplest bus paradigm • All agents operate synchronously • All can source / sink data at same rate • => simple protocol – just manage the source and target JR.

Simple Synchronous Protocol BReq BG R/W Address Data Cmd+Addr Data1 Data2 • Even memory busses are more complex than this – memory (slave) may take time to respond – it may need to control data rate JR.S00 24 .

S00 25 .Typical Synchronous Protocol BReq BG R/W Address Wait Cmd+Addr Data Data1 Data1 Data2 • Slave indicates when it is prepared for data xfer • Actual transfer goes at bus rate JR.

(b) increased complexity • Data bus width: – By increasing the width of the data bus.S00 26 . transfers of multiple words require fewer bus cycles – Example: SPARCstation 20’s memory bus is 128 bit wide – Cost: more bus lines • Block transfers: – – – – Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request JR.Increasing the Bus Bandwidth • Separate versus multiplexed address and data lines: – Address and data can be transmitted in one bus cycle if separate address and data lines are available – Cost: (a) more bus lines.

Increasing Transaction Rate on Multimaster Bus • Overlapped arbitration – perform arbitration for next transaction during current transaction • Bus parking – master holds onto bus and performs multiple transactions as long as no other master makes request • Overlapped address / data phases – requires one of the above techniques • Split-phase (or packet switched) bus – completely separate address and data phases – arbitrate separately for each – address phase yield a tag which is matched with data phase • ”All of the above” in most modern memory buses JR.S00 27 .

S00 28 .Memory Bus Survey Bus Originator Clock Rate (MHz) Address lines Data lines Data Sizes (bits) Clocks/transfer Peak (MB/s) Master Arbitration Slots Busses/system Length MBus Summit Challenge Sun HP SGI 40 60 48 36 48 40 64 128 256 256 512 1024 4 5 4? 320(80) 960 1200 Multi Multi Multi Central Central Central 16 9 10 1 1 1 13 inches 12? inches XDBus Sun 66 muxed 144 (parity) 512 1056 Multi Central 2 17 inches JR.1993 CPU.

indicating data received t3: Master releases req t4: Slave releases ack JR.S00 29 . data Waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack.Write Transaction Address Data Read Req Ack Asynchronous Handshake (4-phase) Master Asserts Address Master Asserts Data Next Address t0 • • • • • • t1 t2 t3 t4 t5 t0 : Master has obtained control and asserts address. direction.

Read Transaction Address Data Read Req Ack t0 • • • • • • Master Asserts Address Slave Data Next Address t1 t2 t3 t4 t5 t0 : Master has obtained control and asserts address.S00 30 . data received t4: Slave releases ack JR. direction. data Waits a specified amount of time for slaves to decode target\ t1: Master asserts request line t2: Slave asserts ack. indicating ready to transmit data t3: Master releases req.

32.24.1993 Backplane/IO Bus Survey Bus Originator Addressing Data Sizes (bits) Master Arbitration Peak (MB/s) Max Power (W) SBus Sun Virtual 8.16.64 Multi Central 20 75 13 Intel 33 Physical 8.16.32 Single Central 25 84 26 IBM async Physical Physical 8.16.32 Multi Central 89 16 TurboChannel MicroChannel PCI DEC 12.24.64 Multi Central 33 111 (222) 25 Clock Rate (MHz) 16-25 32 bit read (MB/s) 33 JR.32.S00 31 .

High Speed I/O Bus • Examples – graphics – fast networks • Limited number of devices • Data transfer bursts at full rate • DMA transfers important – small controller spools stream of bytes to or from memory • Either side may need to squelch transfer – buffers fill up JR.S00 32 .

S00 33 .PCI Read/Write Transactions • All signals sampled on rising edge • Centralized Parallel Arbitration – overlapped with previous transaction • • • • All transfers are (unlimited) bursts Address phase starts by asserting FRAME# Next cycle “initiator” asserts cmd and address Data transfers happen on when – IRDY# asserted by master when ready to transfer data – TRDY# asserted by target when ready to transfer data – transfer when both asserted on rising edge • FRAME# deasserted when master intends to complete only one more data transfer JR.

PCI Read Transaction – Turn-around cycle on any signal driven by more than one agent JR.S00 34 .

S00 35 .PCI Write Transaction JR.

Bridge MPEG The “Board-on-a-Chip” Approach I O O C Custom Interfaces Control Wires Peripheral Bus JR.The System-on-a-Chip Nightmare System Bus DMA CPU DSP Mem Ctrl.S00 36 .

Sonics SOC Integration Architecture DMA CPU DSP Open Core Protocol™ MultiChip Backplane™ C MEM I O SiliconBackplane Agent™ SiliconBackplane™ (patented) { MPEG JR.S00 37 .

S00 38 . and test flows) JR. control.Open Core Protocol Goals • • • • • Bus Independent Scalable Configurable Synthesis/Timing Analysis Friendly Encompass entire core/system interface needs (data.

data. etc. Control. and Test Flows • Data Flow – Signals and protocols associated with moving data – Includes address.Data. high-level flow control. etc. – Similar to services provided by traditional computer buses • Control Flow – Signals and protocols associated with non-data communication – Sideband .not synchronized to data flow (out of band) – Examples include interrupts. • Test Flow – Signals and protocols related to debug and manufacturing test JR.S00 39 . handshaking.

uni-directional. synchronous – easy physical implementation • Master/Slave. simple roles • Extensions – added functionality to support cores with more complex interface requirements • Configurability – pay only for the features needed for a given core JR. request/response – well-defined.S00 40 .OCP Overview • Point-to-point.

Slave IP Core Master Open Core Protocol Slave Initiator Slave Master IP Core Master Slave IP Core Slave Request Target Response Master On-Chip Bus JR.S00 41 .Master vs.

MAddr SCmdAccept SResp.S00 42 MCmd.Basic OCP Clk MCmd [3] MAddr [N] MData [N] SCmdAccept r e sa M t SResp [3] SData [N] eva S l MCmd. Address Command Accept Response. Data Command Accept JR. SData Read: Command. Maddr. MData SCmdAccept . Address. Data Write (posted): Command.

read data) to Master – Only available for read transfers (posted write model) • Datahandshake Phase (Optional) – Allows pipelining request ahead of write data – Only available for write transfers • Phase ordering – Request -> Datahandshake -> Response JR. etc.Protocol Phases • Request Phase (begins Transfer) – Master presents request (command. address.) to Slave • Response Phase (ends Transfer) – Slave presents response (success/fail.S00 43 .

S00 44 .OCP Extensions • Simple Extensions – – – – Byte Enables Bursts Flow Control Data Handshake • Complex Extensions – Threads and Connections • Sideband Signals JR.

The Backplane: Why Not Use a Computer Bus? Transmit FIFO IP Core Arbiter Receive FIFO Address IP Core Data Computer Bus IP Core Time    IP Core    • Expensive to decouple • Not designed for real-time JR.S00 45 .

Communication Buses Decouple and Guarantee Real Time Transmit FIFO IP Core TDMA TDMA Receive FIFO IP Core Data Communications Bus IP Core Time    IP Core    • Connections are expensive • Poor read latency JR.S00 46 .

S00 47 .SiliconBackplane™ Employs Best of Both From Computing • Address-based selection • Write and read transfers • Pipelining From Communications • Efficient BW decoupling • Guaranteed BW & latency • Side-band signaling DMA CPU DSP MPEG C MEM I O JR.

Distributed TDMA .Round robin Current Slot • Provides fine control over system bandwidth Arbitration Command JR.S00 48 .Guaranteed Bandwidth Arbitration • Independent arbitration for every cycle includes two phases: .

Guaranteed Latency • Fixed latency between command/address and data/response phases • Matches pipelined CPU model ensuring high performance access to on-chip resources • Pipelined data routed through SiliconBackplane™ • Latency re-programmable in software • Variable-latency blocks do not tie up the SiliconBackplane JR.S00 49 .

Pipeline Diagram Cycle Arbitration Command Address Data Response WR A1 WR A2 D1 D2 1 2 3 4 5 6 7 8 JR.S00 50 .

Integrated Signaling Mechanism • Dedicated SiliconBackplane™ wires (Flags) support: – Bus-style out-of-band signaling (interrupts) – Point-to-point communications (flow control) – Dynamic point-to-point (retry mechanism) • Same design flow.S00 51 . timing. flexibility as address/data portion of SonicsIA™ JR.

MultiChip Backplane™ Extends SonicsIA™ Between Chips SiliconBackplane CPU-Based ASSP FPGA MultiChip Backplane ASSP Seamless integration of protocols JR.S00 52 .

Validation / Test • SiliconBackplane highly visible for test ™ MultiChip Backplane™ Test Vectors Test Vectors – All subsystems communicate through SiliconBackplane • Test Interfaces: – MultiChip Backplane: 100’s MB/sec. – ServiceAgent: Scan-based • Each subsystem can be tested/validated stand-alone JR.S00 53 .

S00 54 . number of devices.Summary • Busses are an important technique for building large-scale systems – Their speed is critically dependent on factors such as length. – Critically limited by capacitance – Tricks: esoteric drive technology such as GTL – Master: The device that can initiate new transactions – Slaves: Devices that respond to the master – Synchronous: bus includes clock – Asynchronous: no clock. etc. just REQ/ACK strobing – Well-defined and clear communication protocols – Physical layer hidden to designer • Important terminology: • Two types of bus timing: • System-on-a-Chip approach invites new solutions JR.