You are on page 1of 40

PCI Express Basics

Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc.

Copyright 2007, PCI-SIG, All Rights Reserved

PCI Express Introduction


PCI Express architecture is a high performance, IO interconnect for peripherals in computing/communication platforms Evolved from PCI and PCI-X architectures
9 Yet PCI Express architecture is significantly different from its predecessors PCI and PCI-X

PCI Express is a serial point-to-point interconnect between two devices Implements packet based protocol for information transfer Scalable performance based on number of signal Lanes implemented on the PCI Express interconnect

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

PCI Express Terminology


PCI Express Device A

Signal Wire

Link Lane

PCI Express Device B


PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 3

PCI Express Throughput


Link Width Aggregate BW (GBytes/s) x1 0.5 x2 1 x4 2 x8 4 x12 6 x16 8 x32 16

Assumes 2.5 GT/s signaling in each direction 80% BW utilized due to 8b/10b encoding overhead Aggregate bandwidth implies simultaneous traffic in both directions Peak bandwidth is higher than any bus available

PCIe 2.0 PHYs may optionally support 5 GT/s signaling, thus doubling above bandwidth numbers

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

PCI Express Features


Point-to-point connection Serial bus means fewer pins Scaleable: x1, x2, x4, x8, x12, x16, x32 Dual Simplex connection 2.5VGT/s transfer/direction/s Packet based transaction protocol
PCIe Device A
PCI-SIG Developers Conference

Packet

Link (x1, x2, x4, x8, x12, x16 or x32)


Packet
Copyright 2007, PCI-SIG, All Rights Reserved

PCIe Device B
5

Differential Signaling
Electrical characteristics of PCI Express signal
9 Differential signaling
Transmitter Differential Peak voltage = 0.4 - 0.6 V Transmitter Common mode voltage = 0 - 3.6 V
DVcm D+ V Diffp

Two devices at opposite ends of a Link may support different DC common mode voltages

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

Example PCI Express Topology


CPU

Root Complex
PCIe PCIe PCIe Switch Endpoint PCIe PCIe PCIe

Memory

PCIe Endpoint
PCIe

PCIe Bridge To
PCI/PCI-X

PCIe Endpoint

Legacy Endpoint

PCI/PCI-X

Legend PCI Express Device Downstream Port PCI Express Device Upstream Port
PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 7

Example PCI Express Topology Root & Switch


CPU Bus

CPU Root
RCRB

Bus 0

Root Complex
PCIe PCIe PCIe Switch Endpoint PCIe PCIe PCIe

Memory

Virtual PCI Bridge

Virtual PCI Bridge

Virtual PCI Bridge

PCIe Endpoint
PCIe

PCIe Bridge To
PCI/PCI-X

PCI Express Links

PCIe Endpoint

Legacy Endpoint

Switch
PCI/PCI-X
Virtual PCI Bridge

Virtual PCI Bridge

Legend PCI Express Device Downstream Port PCI Express Device Upstream Port
PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved

Virtual PCI Bridge

Virtual PCI Bridge

Example Low Cost PCI Express System


Processor FSB PCI Express GFX GFX

Root Complex

DDR SDRAM
Slots

HDD

Serial ATA USB 2.0 LPC

PCI Express PCI IO Controller Hub (ICH) IEEE 1394


Slot

S IO
COM1 COM2

GB Ethernet

Add-In Add-In Add-In PCI Express

PCI Express Link


PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

Example PCI Express Server System


Processor FSB PCI Express GFX GFX Endpoint InfiniBand Switch Out-of-Box 10Gb Ethernet Endpoint Processor

Root Complex

DDR SDRAM

InfiniBand Endpoint

Switch

Switch

Fiber Channel RAID Disk array

Add-In

Switch

10Gb Ethernet Endpoint

PCI Express to-PCI Endpoint

SCSI Endpoint PCI

PCI Express Link

Add-In

Gb Ethernet Endpoint
COM1 COM2

S IO

IEEE 1394

Slots

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

10

Transaction Types, Address Spaces


Request are translated to one of four transaction types by the Transaction Layer:
1. Memory Read or Memory Write. Used to transfer data from or to a memory mapped location The protocol also supports a locked memory read transaction variant. I/O Read or I/O Write. Used to transfer data from or to an I/O location These transactions are restricted to supporting legacy endpoint devices. Configuration Read or Configuration Write. Used to discover device capabilities, program features, and check status in the 4KB PCI Express configuration space. Messages. Handled like posted writes. Used for event signaling and general purpose messaging.

2.

3.

4.

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

11

PCI Express TLP Types


Description
Memory Read Request Memory Read Request Locked Access Memory Write Request IO Read Request IO Write Request Configuration Read Request Type 0 and Type 1 Configuration Write Request Type 0 and Type 1 Message Request without Data Payload Message Request with Data Payload Completion without Data (used for IO, configuration write completions and read completion with error completion status) Completion with Data (used for memory, IO and configuration read completions) Completion for Locked Memory Read without Data (used for error status) Completion for Locked Memory Read with Data

Abbreviated Name
MRd MRdLk MWr IORd IOWr CfgRd0, CfgRd1 CfgWr0, CfgWr1 Msg MsgD Cpl CplD CplLk CplDLk

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

12

Three Methods For Packet Routing


Each request or completion header is tagged as to its type, and each of the packet types is routed based on one of three schemes:
9 Address Routing 9 ID Routing 9 Implicit Routing

Memory and IO requests use address routing. Completions and Configuration cycles use ID routing. Message requests have selectable routing based on a 3-bit code in the message routing sub-field of the header type field.

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

13

Programmed I/O Transaction


Processor MRd Requester: -Step 1: Root Complex (requester) initiates Memory Read Request (MRd) -Step 4: Root Complex receives CplD MRd Switch A MRd CplD Switch B MRd Endpoint Endpoint Endpoint CplD Completer: -Step 2: Endpoint (completer) receives MRd -Step 3: Endpoint returns Completion with data (CplD) Endpoint Endpoint FSB Processor

Root Complex CplD Switch C

DDR SDRAM

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

14

DMA Transaction
Processor FSB Completer: -Step 2: Root Complex (completer) receives MRd -Step 3: Root Complex returns Completion with data (CplD) CplD Switch A CplD MRd Switch B Endpoint Endpoint Endpoint Processor

Root Complex MRd Switch C

DDR SDRAM

CplD Endpoint Endpoint

MRd Requester: -Step 1: Endpoint (requester) initiates Memory Read Request (MRd) -Step 4: Endpoint receives CplD
15

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

Peer-to-Peer Transaction
Processor FSB Processor

Root Complex CplD Switch A CplD MRd Switch B Endpoint MRd Endpoint CplD Endpoint MRd MRd Switch C CplD

DDR SDRAM

CplD Endpoint Endpoint

MRd

Requester: -Step 1: Endpoint (requester) initiates Memory Read Request (MRd) -Step 4: Endpoint receives CplD
Copyright 2007, PCI-SIG, All Rights Reserved

Completer: -Step 2: Endpoint (completer) receives MRd -Step 3: Endpoint returns Completion with data (CplD)

PCI-SIG Developers Conference

16

PCI Express Device Layers


PCI Express Device A Device Core PCI Express Core Logic Interface
TX RX TX

PCI Express Device B Device Core PCI Express Core Logic Interface
RX

Transaction Layer Data Link Layer Physical Layer


Link

Transaction Layer Data Link Layer Physical Layer

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

17

TLP Origin and Destination


PCI Express Device A Device Core PCI Express Core Logic Interface
TX TLP Transmitted RX TX

PCI Express Device B Device Core PCI Express Core Logic Interface
RX TLP Received

Transaction Layer Data Link Layer Physical Layer


Link

Transaction Layer Data Link Layer Physical Layer

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

18

TLP Structure
Information in core section of TLP comes from Software Layer / Device Core Bit transmit direction

Start Sequence Header


1B 2B 3-4 DW

Data Payload
0-1024 DW

ECRC LCRC End


1DW 1DW 1B

Created by Transaction Layer

Appended by Data Link Layer

Appended by Physical Layer


PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 19

DLLP Origin and Destination


PCI Express Device A Device Core PCI Express Core Logic Interface
TX RX TX

PCI Express Device B Device Core PCI Express Core Logic Interface
RX

Transaction Layer
DLLP Transmitted

Transaction Layer Data Link Layer Physical Layer


Link DLLP Received

Data Link Layer Physical Layer

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

20

DLLP Structure
Bit transmit direction

Start
1B

DLLP
4B

CRC
2B

End
1B

Data Link Layer

Appended by Physical Layer

ACK / NAK Packets Flow Control Packets Power Management Packets Vendor Defined Packets

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

21

Ordered-Set Origin and Destination


PCI Express Device A Device Core PCI Express Core Logic Interface
TX RX TX

PCI Express Device B Device Core PCI Express Core Logic Interface
RX

Transaction Layer Data Link Layer


Ordered-Set Transmitted

Transaction Layer Data Link Layer Physical Layer


Link Ordered-Set Received

Physical Layer

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

22

Ordered-Set Structure
COM Identifier Identifier Identifier

Training Sequence One (TS1)


9 16 character set: 1 COM, 15 TS1 data characters

Training Sequence Two (TS2)


9 16 character set: 1 COM, 15 TS2 data characters

SKIP
9 4 character set: 1 COM followed by 3 SKP identifiers

Electrical Idle (IDLE)


9 4 characters: 1 COM followed by 3 IDL identifiers

Fast Training Sequence (FTS)


9 4 characters: 1 COM followed by 3 FTS identifiers
PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 23

Quality of Service
Processor PCI Express GFX GFX Endpoint InfiniBand Switch Out-of-Box 10Gb Ethernet Endpoint Processor

Root Complex

DDR SDRAM

InfiniBand Endpoint

Switch

Switch

Fiber Channel RAID Disk array

Add-In
Slot

Switch

10Gb Ethernet Endpoint

PCI Express to-PCI Endpoint

SCSI Endpoint PCI

PCI Express Link


PCI-SIG Developers Conference

Video Camera

SCSI Endpoint
COM1 COM2

S IO

IEEE 1394

Slots

Copyright 2007, PCI-SIG, All Rights Reserved

24

Traffic Classes and Virtual Channels


Quality of Service (QoS) policy through Virtual Channel and Traffic Class tags

Transmitter Device A
TC[2:0] maps to VC0 VC0

Receiver Device B Link VC0 TC/VC Mapping VC0 Arbitration

TC[7:0]

TC/VC Mapping

Buffers
TC[7:3] maps to VC1 VC1

Buffers TC[7:0] VC1

VC1

One physical Link, multiple virtual paths

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

25

Port Arbitration and VC Arbitration


Link
TC[2:0] to VC0 TC[7:3] to VC1

VC0 VC1
g in 0 p t ap or M sP C s /V gre C T rE fo

Link
VC0 Port Arb VC1 Port Arb VC0 VC Arb VC1

VC0 0 VC1

TC[2:0] to VC0 TC[7:3] to VC1

VC0 VC1

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

T f C o r E /VC g M r es ap s pi P n o rt g 0

Link

26

PCI Express Flow Control


Credit-based flow control is point-to-point based, not end-to-end
Buffer space available TLP VC Buffer

Transmitter

Receiver

Flow Control DLLP (FCx) Receiver sends Flow Control Packets (FCP) which are a type of DLLP (Data Link Layer Packet) to provide the transmitter with credits so that it can transmit packets to the receiver

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

27

ACK/NAK Protocol Overview


Transmit Device A
From Transaction Layer Tx

Receiver Device B
To Transaction Layer Rx

Data Link Layer


TLP Sequence TLP Replay Buffer Mux DLLP LCRC
ACK / NAK

Data Link Layer


DLLP
ACK / NAK

TLP Sequence TLP De-mux

LCRC

De-mux Mux Error Check

Tx

Rx
DLLP
ACK / NAK

Tx

Rx

Link
TLP Sequence TLP PCI-SIG Developers Conference LCRC 28

Copyright 2007, PCI-SIG, All Rights Reserved

ACK/NAK Protocol: Point-to-Point


1a. Request 4b. ACK Requester 1b. ACK 4a. Completion Switch 2b. ACK 3a. Completion 2a. Request 3b. ACK Completer

ACK returned for good reception of Request or Completion NAK returned for error reception of Request or Completion

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

29

Interrupt Model: Three Methods


PCI Express supports three interrupt reporting mechanisms:
1. Message Signaled Interrupts (MSI)

Legacy endpoints are required to support MSI (or MSI-X) with 32- or 64-bit MSI capability register implementation
Native PCI Express endpoints are required to support MSI with 64-bit MSI capability register implementation

2. 3.

Message Signaled Interrupts - X (MSI-X)


Legacy and native endpoints are required to support MSI-X (or MSI) and implement the associated MSI-X capability register Native and Legacy endpoints are required to support Legacy INTx Emulation PCI Express defines in-band messages which emulate the four physical interrupt signals (INTA-INTD) routed between PCI devices and the system interrupt controller Forwarding support required by switches
Copyright 2007, PCI-SIG, All Rights Reserved 30

INTx Emulation.

PCI-SIG Developers Conference

Native and Legacy Interrupts

PCIe -

PCIe

PCIe

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

31

PCI Express Error Handling


All PCI Express devices are required to support some combination of:
9 Existing software written for generic PCI error handling, and which takes advantage of the fact that PCI Express has mapped many of its error conditions to existing PCI error handling mechanisms. 9 Additional PCI Express-specific reporting mechanisms

Errors are classified as correctable and uncorrectable. Uncorrectable errors are further divided into:
9 Fatal uncorrectable errors 9 Non-fatal uncorrectable errors.

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

32

Correctable Errors
Errors classified as correctable, degrade system performance, but recovery can occur with no loss of information
9 Hardware is responsible for recovery from a correctable error and no software intervention is required.

Even though hardware handles the correction, logging the frequency of correctable errors may be useful if software is monitoring link operations. An example of a correctable error is the detection of a link CRC (LCRC) error when a TLP is sent, resulting in a Data Link Layer retry event.

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

33

Uncorrectable Errors
Errors classified as uncorrectable impair the functionality of the interface and there is no specification mechanism to correct these errors The two subgroups are fatal and non-fatal
1. Fatal Uncorrectable Errors: Errors which render the link unreliable
First-level strategy for recovery may involve a link reset by the system Handling of fatal errors is platform-specific

2.

Non-Fatal Uncorrectable Errors: Uncorrectable errors associated with a particular transaction, while the link itself is reliable
Software may limit recovery strategy to the device(s) involved Transactions between other devices are not affected

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

34

Baseline Error Reporting


Enabling/disabling error reporting Providing error status Providing error status for Link Training Initiating Link Re-training

Registers provide control and status for


9 Correctable errors 9 Non-fatal uncorrectable errors 9 Fatal uncorrectable errors 9 Unsupported request errors
PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 35

Advanced Error Reporting


Finer granularity in defining error type Ability to define severity of uncorrectable errors
9 Either send ERR_FATAL or ERR_NONFATAL message for a given error

Support for error logging of error type and TLP header related to error Ability to mask reporting of errors Enable/disable root reporting of errors Identify source of errors

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

36

PCI Express Configuration Space

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

37

Summary of Changes for 2.0


Higher speed (5.0 GT/s), supported by:
9 Selectable de-emphasis levels 9 Selectable transmitter voltage range

Dynamic speed and link width changes


9 Power savings, higher bandwidth, reliability

Virtualization support
9 Access Control Services

Other New Features


9 Completion timeout control 9 Function Level Reset 9 Modified Compliance Pattern for testing

PCI-SIG Developers Conference

Copyright 2007, PCI-SIG, All Rights Reserved

38

Thank you for attending the PCI-SIG Developers Conference 2007.

For more information please go to www.pcisig.com


PCI-SIG Developers Conference Copyright 2007, PCI-SIG, All Rights Reserved 39

PCI Express Basics


Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc.

Copyright 2007, PCI-SIG, All Rights Reserved

40

You might also like