You are on page 1of 54

Lecture 15: Busses and Networking (1

)
Prof. Jan Rabaey Computer Science 252, Spring 2000
Based on slides from Dave Patterson, John Kubiatowicz Bill Dally, and Sonics, Inc
JR.S00 1

A Communication-Centric World
• Computation is getting distributed …
– Internet, WAN, LAN, BodyLAN, Home Networks, Microprocessor Peripherals, Processor-Memory Interface, System-on-a-Chip

• Efficient Networking and Communication is Crucial • The System-on-a-Chip implies the Networkon-a-Chip • In Next Set of Lectures:
– Busses and Networks – But more importantly, the impact of integration
JR.S00 2

What is a bus?
A Bus Is: • shared communication link • single set of wires used to connect multiple subsystems
Processor Input Control Memory Datapath Output

• A Bus is also a fundamental tool for composing large, complex systems
– systematic means of abstraction
JR.S00 3

Busses

JR.S00 4

Advantages of Buses

Processer

I/O Device

I/O Device

I/O Device

Memory

• Versatility:
– New devices can be added easily – Peripherals can be moved between computer systems that use the same bus standard

• Low Cost:
– A single set of wires is shared in multiple ways

JR.S00 5

Disadvantage of Buses Processor I/O Device I/O Device I/O Device Memory • It creates a communication bottleneck – The bandwidth of that bus can limit the maximum I/O throughput • The maximum bus speed is largely limited by: – The length of the bus – The number of devices on the bus – The need to support a range of devices with: » Widely varying latencies » Widely varying data transfer rates JR.S00 6 .

S00 7 .General Organization of a Bus Control Lines Data Lines • Control lines: – Signal requests and acknowledgments – Indicate what type of information is on the data lines • Data lines carry information between the source and the destination: – Data and Addresses – Complex commands JR.

Master versus Slave Master issues command Bus Master Data can go either way Bus Slave • A bus transaction includes two parts: – Issuing the command (and address) – Transferring the data – issuing the command (and address) – request – action • Master is the one who starts the bus transaction by: • Slave is the one who responds to the address by: – Sending data to the master if the master ask for data – Receiving data from the master if the master wants to send data JR.S00 8 .

S00 9 . and I/O devices to coexist – Cost advantage: one bus for all components JR.• Types of Busses Processor-Memory Bus (design specific) – Short and high speed – Only need to match the memory system » Maximize memory-to-processor bandwidth – Connects directly to the processor – Optimized for cache block transfers • I/O Bus (industry standard) – Usually is lengthy and slower – Need to match a wide range of I/O devices – Connects to the processor-memory bus or backplane bus • Backplane Bus (standard or proprietary) – Backplane: an interconnection structure within the chassis – Allow processors. memory.

Example: Pentium System Organization Processor/Memory Bus PCI Bus I/O Busses JR.S00 10 .

AT JR.A Computer System with One Bus: Backplane Bus Bus Backplane Processor Memory I/O Devices • A single bus (the backplane bus) is used for: – Processor to memory communication – Communication between I/O devices and memory • Advantages: Simple and low cost • Disadvantages: slow and the bus can become a major bottleneck • Example: IBM PC .S00 11 .

and a few selected I/O devices – SCCI Bus: the rest of the I/O devices JR.S00 12 .Processor A Two-Bus Processor Memory System Bus Bus Adaptor I/O Bus Bus Adaptor I/O Bus Bus Adaptor I/O Bus Memory • I/O buses tap into the processor-memory bus via bus adaptors: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices • Apple Macintosh-II – NuBus: Processor. memory.

S00 13 .Processor A Three-Bus Processor System Memory Bus Bus Adaptor Bus Adaptor I/O Bus I/O Bus Memory Backplane Bus Bus Adaptor • A small number of backplane buses tap into the processor-memory bus – Processor-memory bus is only used for processor-memory traffic – I/O buses are connected to the backplane bus • Advantage: loading on the processor bus is greatly reduced JR.

North/South Bridge architectures: separate busses Processor Memory Bus Processor “backside cache” Bus Adaptor Backplane Bus Bus Adaptor I/O Bus I/O Bus Memory • Separate sets of pins for different functions – – – – Memory bus Caches Graphics bus (for fast frame buffer) I/O busses are connected to the backplane bus • Advantage: – Busses can run at different speeds – Much less overall loading! JR.S00 14 .

S00 15 .What defines a bus? Transaction Protocol Timing and Signaling Specification Bunch of Wires Electrical Specification Physical / Mechanical Characteristics – the connectors JR.

S00 16 . they cannot be long if they are fast • Asynchronous Bus: – It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol JR.Synchronous and Asynchronous Bus • Synchronous Bus: – Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages: » Every device on the bus must run at the same clock rate » To avoid clock skew.

Busses so far Master Slave °°° Control Lines Address Lines Data Lines Bus Master: has ability to control the bus. JR. Synchronous Bus Transfers: sequence relative to common clock. ack) serve to orchestrate sequencing. initiates transaction Bus Slave: module activated by the transaction Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information. Asynchronous Bus Transfers: control lines (req.S00 17 .

S00 18 .Bus Transaction • Arbitration: Who gets the bus • Request: What do we want to do • Action: What happens in response JR.

Arbitration: Obtaining Access to the Bus Control: Master initiates requests Bus Master Data can go either way Bus Slave • One of the most important issues in bus design: – How is the bus reserved by a device that wishes to use it? • Chaos is avoided by a master-slave arrangement: – Only the bus master can control access to the bus: It initiates and controls all bus requests – A slave responds to read and write requests • The simplest system: – Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction JR.S00 19 .

• Multiple Potential Bus Masters: the Need for Arbitration Bus arbitration scheme: – A bus master wanting to use the bus asserts the bus request – A bus master cannot use the bus until its request is granted – A bus master must signal to the arbiter the end of the bus utilization • Bus arbitration schemes usually try to balance two factors: – Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus • Bus arbitration schemes can be divided into four broad classes: – Daisy chain arbitration – Centralized. – Distributed arbitration by collision detection: Each device just “goes for it”. Problems found after the fact. JR. parallel arbitration – Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus.S00 20 .

S00 21 .The Daisy Chain Bus Arbitrations Device 1 Highest Scheme Device 2 Priority Grant Bus Arbiter Grant Grant Release Request Device N Lowest Priority wired-OR • Advantage: simple • Disadvantages: – Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed JR.

S00 22 .Centralized Parallel Arbitration Device 1 Device 2 Device N Grant Bus Arbiter Req • Used in essentially all processor-memory busses and in high-speed I/O busses JR.

S00 23 .Simplest bus paradigm • All agents operate synchronously • All can source / sink data at same rate • => simple protocol – just manage the source and target JR.

Simple Synchronous Protocol BReq BG R/W Address Data Cmd+Addr Data1 Data2 • Even memory busses are more complex than this – memory (slave) may take time to respond – it may need to control data rate JR.S00 24 .

S00 25 .Typical Synchronous Protocol BReq BG R/W Address Wait Cmd+Addr Data Data1 Data1 Data2 • Slave indicates when it is prepared for data xfer • Actual transfer goes at bus rate JR.

(b) increased complexity • Data bus width: – By increasing the width of the data bus.S00 26 . transfers of multiple words require fewer bus cycles – Example: SPARCstation 20’s memory bus is 128 bit wide – Cost: more bus lines • Block transfers: – – – – Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request JR.Increasing the Bus Bandwidth • Separate versus multiplexed address and data lines: – Address and data can be transmitted in one bus cycle if separate address and data lines are available – Cost: (a) more bus lines.

Increasing Transaction Rate on Multimaster Bus • Overlapped arbitration – perform arbitration for next transaction during current transaction • Bus parking – master holds onto bus and performs multiple transactions as long as no other master makes request • Overlapped address / data phases – requires one of the above techniques • Split-phase (or packet switched) bus – completely separate address and data phases – arbitrate separately for each – address phase yield a tag which is matched with data phase • ”All of the above” in most modern memory buses JR.S00 27 .

S00 28 .Memory Bus Survey Bus Originator Clock Rate (MHz) Address lines Data lines Data Sizes (bits) Clocks/transfer Peak (MB/s) Master Arbitration Slots Busses/system Length MBus Summit Challenge Sun HP SGI 40 60 48 36 48 40 64 128 256 256 512 1024 4 5 4? 320(80) 960 1200 Multi Multi Multi Central Central Central 16 9 10 1 1 1 13 inches 12? inches XDBus Sun 66 muxed 144 (parity) 512 1056 Multi Central 2 17 inches JR.1993 CPU.

indicating data received t3: Master releases req t4: Slave releases ack JR.S00 29 . data Waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack.Write Transaction Address Data Read Req Ack Asynchronous Handshake (4-phase) Master Asserts Address Master Asserts Data Next Address t0 • • • • • • t1 t2 t3 t4 t5 t0 : Master has obtained control and asserts address. direction.

Read Transaction Address Data Read Req Ack t0 • • • • • • Master Asserts Address Slave Data Next Address t1 t2 t3 t4 t5 t0 : Master has obtained control and asserts address.S00 30 . data received t4: Slave releases ack JR. direction. data Waits a specified amount of time for slaves to decode target\ t1: Master asserts request line t2: Slave asserts ack. indicating ready to transmit data t3: Master releases req.

32.24.1993 Backplane/IO Bus Survey Bus Originator Addressing Data Sizes (bits) Master Arbitration Peak (MB/s) Max Power (W) SBus Sun Virtual 8.16.64 Multi Central 20 75 13 Intel 33 Physical 8.16.32 Single Central 25 84 26 IBM async Physical 8.24.16.5-25 Physical 8.16.32 Multi Central 89 16 TurboChannel MicroChannel PCI DEC 12.24.64 Multi Central 33 111 (222) 25 Clock Rate (MHz) 16-25 32 bit read (MB/s) 33 JR.32.S00 31 .

High Speed I/O Bus • Examples – graphics – fast networks • Limited number of devices • Data transfer bursts at full rate • DMA transfers important – small controller spools stream of bytes to or from memory • Either side may need to squelch transfer – buffers fill up JR.S00 32 .

S00 33 .PCI Read/Write Transactions • All signals sampled on rising edge • Centralized Parallel Arbitration – overlapped with previous transaction • • • • All transfers are (unlimited) bursts Address phase starts by asserting FRAME# Next cycle “initiator” asserts cmd and address Data transfers happen on when – IRDY# asserted by master when ready to transfer data – TRDY# asserted by target when ready to transfer data – transfer when both asserted on rising edge • FRAME# deasserted when master intends to complete only one more data transfer JR.

PCI Read Transaction – Turn-around cycle on any signal driven by more than one agent JR.S00 34 .

S00 35 .PCI Write Transaction JR.

Bridge MPEG The “Board-on-a-Chip” Approach I O O C Custom Interfaces Control Wires Peripheral Bus JR.The System-on-a-Chip Nightmare System Bus DMA CPU DSP Mem Ctrl.S00 36 .

Sonics SOC Integration Architecture DMA CPU DSP Open Core Protocol™ MultiChip Backplane™ C MEM I O SiliconBackplane Agent™ SiliconBackplane™ (patented) { MPEG JR.S00 37 .

S00 38 . and test flows) JR. control.Open Core Protocol Goals • • • • • Bus Independent Scalable Configurable Synthesis/Timing Analysis Friendly Encompass entire core/system interface needs (data.

data. etc. Control. and Test Flows • Data Flow – Signals and protocols associated with moving data – Includes address.Data. high-level flow control. etc. – Similar to services provided by traditional computer buses • Control Flow – Signals and protocols associated with non-data communication – Sideband .not synchronized to data flow (out of band) – Examples include interrupts. • Test Flow – Signals and protocols related to debug and manufacturing test JR.S00 39 . handshaking.

uni-directional. synchronous – easy physical implementation • Master/Slave. simple roles • Extensions – added functionality to support cores with more complex interface requirements • Configurability – pay only for the features needed for a given core JR. request/response – well-defined.S00 40 .OCP Overview • Point-to-point.

Slave IP Core Master Open Core Protocol Slave Initiator Slave Master IP Core Master Slave IP Core Slave Request Target Response Master On-Chip Bus JR.S00 41 .Master vs.

MAddr SCmdAccept SResp.S00 42 MCmd.Basic OCP Clk MCmd [3] MAddr [N] MData [N] SCmdAccept r e sa M t SResp [3] SData [N] eva S l MCmd. Address Command Accept Response. Data Command Accept JR. SData Read: Command. Maddr. MData SCmdAccept . Address. Data Write (posted): Command.

read data) to Master – Only available for read transfers (posted write model) • Datahandshake Phase (Optional) – Allows pipelining request ahead of write data – Only available for write transfers • Phase ordering – Request -> Datahandshake -> Response JR. etc.Protocol Phases • Request Phase (begins Transfer) – Master presents request (command. address.) to Slave • Response Phase (ends Transfer) – Slave presents response (success/fail.S00 43 .

S00 44 .OCP Extensions • Simple Extensions – – – – Byte Enables Bursts Flow Control Data Handshake • Complex Extensions – Threads and Connections • Sideband Signals JR.

The Backplane: Why Not Use a Computer Bus? Transmit FIFO IP Core Arbiter Receive FIFO Address IP Core Data Computer Bus IP Core Time    IP Core    • Expensive to decouple • Not designed for real-time JR.S00 45 .

Communication Buses Decouple and Guarantee Real Time Transmit FIFO IP Core TDMA TDMA Receive FIFO IP Core Data Communications Bus IP Core Time    IP Core    • Connections are expensive • Poor read latency JR.S00 46 .

S00 47 .SiliconBackplane™ Employs Best of Both From Computing • Address-based selection • Write and read transfers • Pipelining From Communications • Efficient BW decoupling • Guaranteed BW & latency • Side-band signaling DMA CPU DSP MPEG C MEM I O JR.

Distributed TDMA .Round robin Current Slot • Provides fine control over system bandwidth Arbitration Command JR.S00 48 .Guaranteed Bandwidth Arbitration • Independent arbitration for every cycle includes two phases: .

Guaranteed Latency • Fixed latency between command/address and data/response phases • Matches pipelined CPU model ensuring high performance access to on-chip resources • Pipelined data routed through SiliconBackplane™ • Latency re-programmable in software • Variable-latency blocks do not tie up the SiliconBackplane JR.S00 49 .

Pipeline Diagram Cycle Arbitration Command Address Data Response WR A1 WR A2 D1 D2 1 2 3 4 5 6 7 8 JR.S00 50 .

Integrated Signaling Mechanism • Dedicated SiliconBackplane™ wires (Flags) support: – Bus-style out-of-band signaling (interrupts) – Point-to-point communications (flow control) – Dynamic point-to-point (retry mechanism) • Same design flow.S00 51 . timing. flexibility as address/data portion of SonicsIA™ JR.

MultiChip Backplane™ Extends SonicsIA™ Between Chips SiliconBackplane CPU-Based ASSP FPGA MultiChip Backplane ASSP Seamless integration of protocols JR.S00 52 .

Validation / Test • SiliconBackplane highly visible for test ™ MultiChip Backplane™ Test Vectors Test Vectors – All subsystems communicate through SiliconBackplane • Test Interfaces: – MultiChip Backplane: 100’s MB/sec. – ServiceAgent: Scan-based • Each subsystem can be tested/validated stand-alone JR.S00 53 .

S00 54 . number of devices.Summary • Busses are an important technique for building large-scale systems – Their speed is critically dependent on factors such as length. – Critically limited by capacitance – Tricks: esoteric drive technology such as GTL – Master: The device that can initiate new transactions – Slaves: Devices that respond to the master – Synchronous: bus includes clock – Asynchronous: no clock. etc. just REQ/ACK strobing – Well-defined and clear communication protocols – Physical layer hidden to designer • Important terminology: • Two types of bus timing: • System-on-a-Chip approach invites new solutions JR.