Down to the TLP: How PCI express devices talk
(Part I)
Foreword
While I was writing the Xillybus IP core for PCI express, I quickly
found out that it’s very difficult to start off: Online resources as well
as the official spec bombards you with gory details about the nuts
and bolts, but says much less about what the machine is supposed to
do. So once I made the effort to figure that out for myself, I decided
to write this little guide, which will hopefully help others get a softer
start. This is based upon the official PCI Express specification 1.1,
but applies very well to later versions. There is no substitute to
reading the original spec, though. The name of the game if to get the
details right, so that the device works properly in environments that
are not at hand for testing.
Don’t pick on me for not describing the whole picture, or using
inaccurate definitions. Being accurate is what the spec is for. All I’m
trying to do here is to making it more human readable. I’ve also
published a sample TLP sniff dump of a session, which may help
understand how the machinery works.
And I rely on other sources to describe form factors, lane counts,
data rates and such. For an overview of these, I suggest Wikipedia's
entry on this. I also suggest to read about PCI configuration, in
particular the part about enumeration.
So let’s start with some basic insights.
PCI express is not a bus
The first thing to realize about PCI express (PCIe henceforth), is that
it’s not PCI-X, or any other PCI version. The previous PCI versions,
PCI-X included, are true buses: There are parallel rails of copper
physically reaching several slots for peripheral cards. PCIe is morelike a network, with each card connected to a network switch
through a dedicated set of wires. Exactly like a local Ethernet
network, each card has its own physical connection to the switch
fabric. The similarity goes further: The communication takes the
form of packets transmitted over these dedicated lines, with flow
control, error detection and retransmissions. There are no MAC
addresses, but we have the card’s physical (“geographic”) position
instead to define it, before it’s allocated with high-level means of
addressing it (a chunk in the I/O and address space).
As a matter of fact, a minimal (1x) PCIe connection merely consists
of four wires for data transmission (two differential pairs in each
direction) and another pair of wires to supply the card with a
reference clock. That's it.
On the other hand, the PCIe standard was deliberately made to
behave very much like classic PCI. Even though it’s a packet-based
network, it’s all about addresses, reads, writes an interrupt.
There's still the plug-and-play configuration done, and the cards are
accessed in terms of reads and writes to address and I/O space, just
like before. There are still Vendor/Product IDs, and several
mechanisms to mimic old behavior. To make a long story short, the
PCle standard goes a long way to look like good old PCI to an
operation system unaware of PCIe.
So PCle is a packet network faking the traditional PCI bus. Its entire
design makes it possible to migrate a PCI device to PCIe without
making any change in software, and/or transparently bridge between
PCI and PCIe without losing any functionality.
A simple bus transaction
In order to get an understanding of the whole things, let’s see what
happens when a PC’s CPU wants to write a 32-bit word to a PCle
peripheral. Several details and possibilities are deliberately left out
for sake of simplicity in the description below.Since it’s a PC, it’s likely that the CPU itself performs a simple write
operation on its own bus, and that the memory controller chipset,
which is connected to the CPU’s bus, has the direct connection to the
PCIe bus. So what happens is that the chipset (which, in PCIe terms
functions as a Root Complex) generates a Memory Write packet for
transmission over the bus. This packet consists of a header, which is
either 3 or 4 32-bit words long (depending on if 32 or 64 bit
addressing is used) and one 32-bit word containing the word to be
written. This packet simply says “write this data to this address”.
This packet is then transmitted on the chipset’s PCIe port (or one of
them, if there are several). The target peripheral may be connected
directly to the chipset, or there may be a switch network between
them. This way or another, the packet is routed to the peripheral,
decoded, and executed by performing the desired write operation.
A closer look
This simplistic view ignores several details. For example, the
underlying communications mechanism, which consists of three
layers: The Transaction Layer, the Data Link Layer, and the Physical
Layer. The description of the packet above was defined as a
Transaction Layer Packet (TLP), which relates to PCle’s uppermost
layer.
The Data Link layer is responsible for making sure that every TLP
arrives to its destination correctly. It wraps TLPs with its own header
and with a Link CRC, so that the TLP’s integrity is assured. An
acknowledge-retransmit mechanism makes sure no TLPs are lost on
the way. A flow contro! mechanism makes sure a packet is sent only
when the link partner is ready to receive it, All in all, whenever a TLP
is handed over to the Data Link Layer for transmission, we can rely
on its arrival, even if there is a slight uncertainty regarding the time
of arrival. Failing to deliver a TLP is a major malfunction of the bus.
We'll come back to the Data Link Layer when discussing credits and