Professional Documents
Culture Documents
Certificate .................................................................................................................................. 2
Abstract ......................................................................................................................................4
1 Introduction ...........................................................................................................................5
2 Overview of SoC.....................................................................................................................5
2.1 Structure of SoC………………………………………………………………………………………………………….…6
2.2 FPGAs and MPSOCs…………………………………………………………………………………………………….…7
2.3 Introduction to Zynq Ultra scale+ MPSOC……………………………………………………………………...8
5 PCIE Enumeration.……..……………………………………………………………………….…………………….33
5.1 Introduction
5.2 Setting up with ARM DEV Studio
5.3 Running A (General) Bare Metal Image on the N1SDP
5.4 Running XSDB with PCIE4.0 EP
6 Result ………………………………………………………………………………………………………...…….…..33
7 References
1
Abstract
“Post-silicon validation is a critical part of Integrated Circuits. It is the process of finding the bugs that have
escaped from the pre-silicon phase. According to International Technology Roadmap for Semiconductors
(ITRS), time-to-market is the major constraint for verification and testing. The main challenge of post-
silicon validation is that it has limited observability and controllability of internal signals in the
manufactured chips. On-chip buffers are used to improve the observability and controllability of these
signal states during runtime. Compared to existing techniques Trace-based debug technique has been
widely utilized by the industries for the past few years to overcome the challenges.”
“PCI Express is a high performance, general purpose I/O interconnect defined for a wide variety of future
computing and communication platforms. Key PCI attributes, such as its usage model, load-store
architecture, and software interfaces, are maintained, whereas its parallel bus implementation is
replaced by a highly scalable, fully serial interface. PCI Express takes advantage of recent advances in
point-to-point interconnects, Switch-based technology, and packetized protocol to deliver new levels of
performance and features. Power Management, Quality of Service (QoS), Hot-Plug/ hot-swap support,
data integrity, and error handling are among some of the advanced features supported by PCI Express.”
In post-silicon validation of PCIe 4.0, PCIe End point devices connected to AXI Masters via AXI2PCIe
bridge. Validation of PCIe 4.0 Master controller will have standalone test sequence covering all supported
features. To validate a PCI Express transaction in debugging, we need to decode data captured at PCIe
Transaction layer. Which will give us idea about the data transmitted. To decode encrypted data at PCI
Express, we need to develop a utility. which gives end to end debugging visibility on every test sequence,
which saves lot of time spent in debugging the issues thereby reducing the time to market
2
1. Introduction
The primary challenge of PCIe validation is to test system functionality with speed and accuracy so that
the product can go to market. Protocol errors must be detected, analyzed, and corrected in an efficient
manner. Debugging PCIe protocol means capturing at-speed traffic, including power management
transitions. Protocol debug tools need to lock onto traffic quickly, then trigger on a unique protocol
sequence. Debugging lower-level problems, such as power management, requires exceptionally fast lock
times. Once traffic is captured, viewing the data at different levels of abstraction makes it possible to
isolate the problem
Once users achieve a data valid state at the physical-layer level, which requires validation of link signaling,
they test the higher layers at the protocol level with a protocol analyzer and exerciser. Validation of the
PCIe data link layer is performed by the specification tests that check for Transaction layer packets (TLPs)
being transferred, flow control. Validation teams need robust systems that can recover from all errors,
including inter-mittent failures
2. Overview of SoC
SoC stands for System on chip. Whole system is integrated on a single chip. System consists of I/O devices,
Microprocessor, on chip memories, external memory interfaces, clock and reset wizards, oscillator
circuitry, interrupt control unit and these are the basic building block of any SoC. it has many more other
blocks also. SoC has one processor core per chip. This Soc is nothing but a IC or chip, that integrates all
components of a computer or other electronics systems. it has basically analog, digital, mixed signal and
radio frequency functions. The power consumption of any SoC is very low so that these are widely used
in mobile computing market. It consists of Graphic Processing Unit, co-processors. Figure shows
generalized architecture of SoC.
There are three different types of SoC that have captured the today’s market. SoCs having micro
controllers, SoCs having microprocessors this is the second type of SoC. Third category is MPSoCs these
are programmable system on chip. The internal elements are not predefined and can be programmable
in a manner as same as to the FPGA or CPLD.
3
Figure 2.1: Architecture of SoC
4
2.2 FPGAs and MPSOCs
Xilinx’s MPSOC, RFSOC, FPGAs comes under Xilinx’s Ultrascale architectures this are the architectures
which differentiated by there very high operating frequency. These families that address a vast spectrum
of system requirements with a focus on lowering total and relative power consumption through different
innovative technological advancements. Following are the various versions which are available in the
market.
a. Kintex Ultrascale FPGAs High-performance FPGAs with the focus on price and performance,
using both monolithic and next-generation xilinx patented stacked silicon interconnect (SSI)
technology. High DSP and next-generation manufacturing technology transceivers, combined with
low-cost packaging, enable an optimum edge of capability and cost.
b. Kintex UltraScale+ FPGAs Increased performance and on-chip UltraRAM memory to reduce
BOM cost. The ideal mix of high-performance peripherals and cost-effective system
implementation. Kintex UltraScale+ FPGAs have numerous power options that deliver the optimal
balance between the required system performance and the smallest power envelope.
c. Virtex UltraScale FPGAs High-capacity, high-performance FPGAs enabled using same process as
that of kintex ultrascale FPGAs. Virtex UltraScale devices achieves the highest system capacity,
performance and huge bandwidth to address main market and application requirements through
integration of various system-level functions.
d. Virtex UltraScale+ FPGAs the highest transceiver bandwidth, highest DSP count, and highest
on-chip and ecternal memory available in the UltraScale architecture. Virtex UltraScale+ FPGAs
also provide various power options that delivers the minimum optimal balance between the
required system performance and the smallest power envelope.
e. Zynq UltraScale+ RFSoCs Combine RF data converter subsystem and forward error correction
with industry-leading programmable logic and heterogeneous processing capability. Integrated RF-
ADCs, RF-DACs, and soft decision FECs (SD-FEC) provide the key subsystems for multiband,
multimode cellular radios and cable infrastructure.
PCI Express is the fourth-generation high performance I/O bus used to interconnect peripheral devices in
applications such as computing and communication platforms. The fourth-generation buses include the
ISA, EISA, VESA, and Micro Channel buses, while the second-generation buses include PCI, AGP, and PCI-
X. PCI Express is an all-encompassing I/O device interconnect bus that has applications in the mobile,
desktop, workstation, server, embedded computing and communication platforms
PCI Express implements switch-based technology to interconnect many devices. Communication over the
serial interconnect is accomplished using a packet-based communication protocol.
• The AXI-PCIe bridge provides AXI to PCIe protocol translation and vice-versa, ingress/egress address
translation, DMA, and Root Port/Endpoint (RP/EP) mode specific services.
• The integrated block for PCIe interfaces to the AXI-PCIe bridge on one side and the PS-GTR
transceivers on the other. It performs link negotiation, error detection and recovery, and many other
PCIe protocol specific functions. This block cannot be directly accessed.
7
PCI Express Link
“A Link represents a dual-simplex communications channel between two components. The fundamental
PCI Express Link consists of two, low-voltage, differentially driven signal pairs: a Transmit pair and a
Receive pair.
1. The basic Link – PCI Express Link consists of dual unidirectional differential Links, implemented as
a Transmit pair and a Receive pair. A data clock is embedded using an encoding 5 scheme to
achieve very high data rates.
2. Signaling rate – Once initialized, each Link must only operate at one of the supported signaling
levels.
a. For the first generation of PCI Express technology, there is only one signaling rate defined,
which provides an effective 2.5 Gigabits/second/Lane/direction of raw bandwidth.
b. The second generation provides an effective 5.0 Gigabits/second/Lane/direction of raw
bandwidths.
c. The third generation provides an effective 8.0 Gigabits/second/Lane/direction of raw
bandwidth.
d. The fourth generation provides an effective 16.0 Gigabits/second/Lane/direction of raw
bandwidth.
3. Lanes – A Link must support at least one Lane – each Lane represents a set of differential signal
pairs (one pair for transmission, one pair for reception). To scale bandwidth, a Link may aggregate
multiple Lanes denoted by xN where N may be any of the supported Link widths. A x8 Link
operating at the 2.5 GT/s data rate represents an aggregate bandwidth of 20 Gigabits/second of
raw bandwidth in each direction. This specification describes operations for x1, x2, x4, x8, x12,
x16, and x32 Lane widths.
4. Initialization – During hardware initialization, each PCI Express Link is set up following a
negotiation of Lane widths and frequency of operation by the two agents at each end of the Link.
No firmware or operating system software is involved.
5. Symmetry – Each Link must support a symmetric number of Lanes in each direction, i.e., a x16
Link indicates there are 16 differential signal pairs in each direction.”
• Support for single x1, x2, x4, x8, x16 or x32 link.
8
• Endpoint mode supports MSI-X interrupts in addition to MSI and legacy.
• 64-bit AXI3 compliant AXI master and AXI slave interfaces operating at a 250 MHz
clock.
• MSI-X table and PBA implementation at predefined location for Endpoint mode.
• Eight fully-configurable address translation apertures in each direction (egress— AXI to PCIe
and ingress—PCIe to AXI).
• Generation of configuration transactions through the enhanced configuration access
mechanism (ECAM) and messages by the AXI CPU in Root Port mode.
• Receive interrupt controller aggregates and presents legacy and MSI interrupts from PCIe to
the AXI CPU in Root Port mode.
• Each DMA channel controllable from PCIe CPU, AXI CPU, or both.
• Separate source and destination scatter-gather queues with the option to have separate status
scatter-gather queues.”
9
3.3 PCI Express Layering Overview
“The architecture in terms of three discrete logical layers: the Transaction Layer, the Data Link Layer, and
the Physical Layer. Each of these layers is divided into two sections: one that processes outbound ( to be
transmitted) information and one that processes inbound (received) information.”
“PCI Express uses packets to communicate information between components. Packets are formed in the
Transaction and Data Link Layers to carry the information from the transmitting component to the
receiving component. As the transmitted packets flow through the other layers, they are extended with
additional information necessary to handle packets at those layers. At the receiving side the reverse
process occurs and packets get transformed from their Physical Layer representation to the Data Link
Layer representation and finally (for Transaction Layer Packets) to the form that can be processed by the
Transaction Layer of the receiving device”
10
Fig 3.3 b) Packet Flow Through the Layers
11
“The Transaction Layer supports four address spaces: it includes the three PCI address spaces (memory,
I/O, and configuration) and adds Message Space. This specification uses Message Space to support all
prior sideband signals, such as interrupts, power-management requests, and so on, as in-band Message
transactions. You could think of PCI Express Message transactions as “virtual wires” since their effect
is to eliminate the wide array of sideband signals currently used in a platform implementation.”
The transmission side of the Data Link Layer accepts TLPs assembled by the Transaction Layer, calculates
and applies a data protection code and TLP sequence number, and submits them to Physical Layer for
transmission across the Link. The receiving Data Link Layer is responsible for checking the integrity of
received TLPs and for submitting them to the Transaction Layer for further processing. On detection of
TLP error(s), this Layer is responsible for requesting retransmission of TLPs until information is correctly
received, or the Link is determined to have failed.
The Data Link Layer also generates and consumes packets that are used for Link management functions.
To differentiate these packets from those used by the Transaction Layer (TLP), the term Data Link Layer
Packet (DLLP) will be used when referring to packets that are generated and consumed at the Data Link
Layer.”
12
3.3.4 Layer Functions and Services
“1.Transaction Layer Services:
The Transaction Layer, in the process of generating and receiving TLPs, exchanges Flow Control
information with its complementary Transaction Layer on the other side of the Link. It is also responsible
for supporting both software and hardware-initiated power management.
Ordering rules:
• PCI/PCI-X compliant producer/consumer ordering model
• Extensions to support Relaxed Ordering
• Extensions to support ID-Based Ordering
14
3.4 PCI Express Topology
A fabric is composed of point-to-point Links that interconnect a set of components. This figure illustrates
a single fabric instance referred to as a Hierarchy - composed of a Root Complex (RC), multiple Endpoints
(I/O devices), a Switch, and a PCI Express to PCI/PCI-X Bridge, all interconnected via PCI Express Links.
An RC denotes the root of an I/O hierarchy that connects the CPU/memory subsystem to the I/O. As
illustrated in Figure, an RC may support one or more PCI Express Ports. Each interface defines a separate
hierarchy domain. Each hierarchy domain may be composed of a single Endpoint or a sub-hierarchy
containing one or more Switch components and Endpoints.
3.4.2 Endpoints
Endpoint refers to a type of Function that can be the Requester or Completer of a PCI Express transaction
either on its own behalf or on behalf of a distinct non-PCI Express device (other than a PCI device or host
CPU), e.g., a PCI Express attached graphics controller or a PCI Express-USB host controller. Endpoints are
classified as either legacy, PCI Express, or Root Complex Integrated Endpoints
15
Figure 3.4 Example of PCI Express Topology
3.4.3 Switch
A Switch is defined as a logical assembly of multiple virtual PCI-to-PCI Bridge devices as illustrated in
Figure. All Switches are governed by the following base rules. Switches appear to configuration software
as two or more logical PCI-to-PCI Bridges. A Switch forwards transactions using PCI Bridge mechanisms;
e.g., address-based routing except when engaged in a Multicast. Each enabled Switch Port must comply
with the Flow Control specification within this document
16
3.5 PCI Express Transactions
“PCI Express employs packets to accomplish data transfers between devices. A root complex can
communicate with an endpoint. An endpoint can communicate with a root complex. An endpoint can
communicate with another endpoint. Communication involves the transmission and reception of packets
called Transaction Layer packets (TLPs).
1) memory,
2) IO,
3) configuration,
4) message transactions.
Memory, IO and configuration transactions are supported in PCI and PCI-X architectures, but the message
transaction is new to PCI Express. Transactions are defined as a series of one or more packet transmissions
required to complete an information transfer between a requester and a completer.
For Non-posted transactions, a requester transmits a TLP request packet to a completer. Later, the
completer returns a TLP completion packet back to the requester. The purpose of the completion TLP is
to confirm to the requester that the completer has received the request TLP. In addition, non-posted read
transactions contain data in the completion TLP. Non-Posted write transactions contain data in the write
request TLP.
For Posted transactions, a requester transmits a TLP request packet to a completer. The completer
however does NOT return a completion TLP back to the requester. Posted transactions are optimized for
best performance in completing the transaction at the expense of the requester not having knowledge of
successful reception of the request by the completer. Posted transactions may or may not contain data in
the request TLP.”
17
3.6 Transaction Layer Packets
“In PCI Express terminology, high-level transactions originate at the device core of the transmitting device
and terminate at the core of the receiving device. The Transaction Layer is the starting point in the
assembly of outbound Transaction Layer Packets (TLPs), and the end point for disassembly of inbound
TLPs at the receiver. Along the way, the Data Link Layer and Physical Layer of each device contribute to
the packet assembly and disassembly.”
“PCI Express uses a packet-based protocol to exchange information between the Transaction Layers of the
two components communicating with each other over the Link.
PCI Express supports the following basic transaction types:
Transactions are carried using Requests and Completions. Completions are used only where required, for
example, to return read data, or to acknowledge Completion of I/O and Configuration Write Transactions.
Completions are associated with their corresponding Requests by the value in the Transaction ID field of
the Packet header.
All TLP fields marked Reserved (sometimes abbreviated as R) must be filled with all 0's when a TLP is
formed. Values in such fields must be ignored by Receivers and forwarded unmodified by Switches. Note
that for certain fields there are both specified and Reserved values - the handling of Reserved values in
these cases is specified separately for each case.”
18
“Figure above being transmitted/received first (byte 0 if one or more optional TLP Prefixes are present else
byte H). Detailed layouts of the TLP Prefix, TLP Header and TLP Digest are drawn with the lower numbered
bytes on the left rather than on the right as has traditionally been depicted in other PCI specifications. The
header layout is optimized for performance on a serialized interconnect, driven by the requirement that
the most time critical information be transferred first. For example, within the TLP header, the most
significant byte of the address field is transferred first so that it may be used for early address decode.”
The write request packet which contains data is routed through the fabric of switches using information
in the header portion of the packet. The packet makes its way to a completer. The completer accepts the
specified amount of data within the packet. Transaction over. If the write request is received by the
completer in error or is unable to write the posted write data to the destination due to an internal error,
the requester is not informed via the hardware protocol. The completer could log an error and generate
an error message notification to the root complex. Error handling software manages the error.”
19
Table 3.6.2 Fmt[2:0] and Type[4:0] Field Encodings
20
Figure 3.6.2 Memory Write Transaction Protocol
The request TLP is routed through the fabric of switches using information in the header portion of the
TLP. The packet makes its way to a targeted completer. The completer can be a root complex, switches,
bridges or endpoints. When the completer receives the packet and decodes its contents, it gathers the
amount of data specified in the request from the targeted address. The completer creates a single
completion TLP or multiple completion TLPs with data (CplD) and sends it back to the requester. The
completer can return up to 4 KBytes of data per CplD packet. The completion packet contains routing
information necessary to route the packet back to the requester. This completion packet travels through
the same path and hierarchy of switches as the request packet. Requesters uses a tag field in the
completion to associate it with a request TLP of the same tag value it transmitted earlier. Use of a tag in
the request and completion TLPs allows a requester to manage multiple outstanding transactions. If a
completer is unable to obtain requested data because of an error, it returns a completion packet without
data (Cpl) and an error status indication. The requester determines how to handle the error at the
software layer.”
21
Figure 3.6.3 Non-Posted Read Transaction Protocol
A request packet with data is routed through the fabric of switches using information in the header of the
packet. The packet makes its way to a completer. When the completer receives the packet and decodes
its contents, it accepts the data. The completer creates a single completion packet without data (Cpl) to
confirm reception of the write request. This is the purpose of the completion.”
22
Figure 3.6.4 Non-Posted Write Transaction Protocol
“There are two categories of message request TLPs, Msg and MsgD. Some message requests propagate
from requester to completer, some are broadcast requests from the root complex to all endpoints, some
are transmitted by an endpoint to the root complex. Message packets may be routed to completer(s)
based on the message’s address, device ID or routed implicitly. The completer accepts any data that may
be contained in the packet (if the packet is MsgD) and/or performs the task specified by the message.
Message request support eliminates the need for side-band signals in a PCI Express system. They are used
for PCI style legacy interrupt signaling, power management protocol, error signaling, unlocking a path in
the PCI Express fabric, slot power support, hot plug protocol, and vender defined purposes.”
23
Figure 3.6.5 Posted Message Transaction Protocol
24
“PCI Express device Functions are required to support D0 and D3 device states; PCI-PCI Bridge structures
representing PCI Express Ports as described in Section 7.1 are required to indicate PME Message passing
capability due to the in-band nature of PME messaging for PCI Express.
The PME_Status bit for the PCI-PCI Bridge structure representing PCI Express Ports, however, is only Set
when the PCI-PCI Bridge Function is itself generating a PME. The PME_Status bit is not Set when the Bridge
is propagating a PME Message but the PCI-PCI Bridge Function itself is not internally generating a PME.”
4 Power Management:
Power Management states are as follows:
1. D states are associated with a particular Function
2. D0 is the operational state and consumes the most power
3. D1 and D2 are intermediate power saving states
4. D3Hot is a very low power state
5. D3Cold is the power off state
L states are associated with a particular Link
1. L0 is the operational state
2. L0s, L1, L1.0, L1.1, and L1.2 are various lower power states
“PCI Express defines Link power management states, replacing the bus power management states that
were defined by the PCI Bus Power Management Interface Specification. Link states are not visible to PCI-
PM legacy compatible software, and are either derived from the power management D-states of the
corresponding components connected to that Link or by ASPM protocols
• L0 - Active state.
L0 support is required for both ASPM and PCI-PM compatible power management.
25
L0s support is optional for ASPM unless the applicable form factor specification for the Link
explicitly requires L0s support.
All main power supplies, component reference clocks, and components' internal PLLs must be active at all
times during L0s. TLP and DLLP transmission is disabled for a Port whose Link is in Tx_L0s.
The Physical Layer provides mechanisms for quick transitions from this state to the L0 state. When
common (distributed) reference clocks are used on both sides of a Link, the transition time from L0s to L0
is desired to be less than 100 Symbol Times.
It is possible for the Transmit side of one component on a Link to be in L0s while the Transmit side of the
other component on the Link is in L0.
L1 support is required for PCI-PM compatible power management. L1 is optional for ASPM
unless specifically required by a particular form factor.
When L1 PM Substates is enabled by setting one or more of the enable bits in the L1 PM Substates Control
1 Register this state is referred to as the L1.0 substate.”
All main power supplies must remain active during L1. As long as they adhere to the advertised L1 exit
latencies, implementations are explicitly permitted to reduce power by applying techniques such as, but
not limited to, periodic rather than continuous checking for Electrical Idle exit, checking for Electrical Idle
exit on only one Lane, and powering off of unneeded circuits. All platform-provided component reference
clocks must remain active during L1, except as permitted by Clock Power Management (using CLKREQ#)
and/or L1 PM Substates when enabled. A component's internal PLLs may be shut off during L1, enabling
greater power savings at a cost of increased exit latency
26
“The L1 entry negotiation (whether invoked via PCI-PM or ASPM mechanisms) and the L2/L3 Ready entry
negotiation map to a state machine which corresponds to the actions described later in this chapter. This
state machine is reset to an idle state. For a Downstream component, the first action taken by the state
machine, after leaving the idle state, is to start sending the appropriate entry DLLPs depending on the
type of negotiation. If the negotiation is interrupted, for example by a trip through Recovery, the state
machine in both components is reset back to the idle state. The Upstream component must always go to
the idle state and wait to receive entry DLLPs. The Downstream component must always go to the idle
state and must always proceed to sending entry DLLPs to restart the negotiation.”
4.Debug Flow
4.1 Xilinx System Debugger
Xilinx® System Debugger uses the Xilinx hw_server as the underlying debug engine. SDK translates each
user interface action into a sequence of TCF commands. “It then processes the output from system
Debugger to display the current state of the program being debugged. It communicates to the processor
on the hardware using Xilinx hw_server.”
• Executable ELF File: “To debug your application, you must use an Executable and Linkable Format
(ELF) file compiled for debugging. The debug ELF file contains additional debug information for the
debugger to make direct associations between the source code and the binaries generated from that
original source.”
27
• Debug Configuration: “To launch the debug session, you must create a debug configuration in SDK.
This configuration captures options required to start a debug session, including the executable name,
processor target to debug, and other information.”
• SDK Debug Perspective: “Using the Debug perspective, you can manage the debugging or running of
a program in the Workbench. You can control the execution of your program by setting breakpoints,
suspending launched programs, stepping through your code, and examining the contents of
variables.”
The debug logic for each processor enables program debugging by controlling the processor execution.
The debug logic on soft Micro Blaze processor cores is configurable and can be enabled or disabled by the
hardware designer when building the embedded hardware. Enabling the debug logic on Micro Blaze
processors provides advanced debugging capabilities such as hardware breakpoints, read/write memory
watchpoints, safe-mode debugging, and more visibility into Micro Blaze processors.”
28
4.3 PCIE Decode Utility Flow
1. The System Debugger will starts generating log file, which will have all PCI express transaction
data over Root complex and End point devices. The log file data will not be in readable format,
because it is decrypted over through different layers PCI Express in transaction.
2. Generated log file will be given as input to the PCIE Decoder, it will read each state of transaction.
If any TLP available in the transaction, decoder will analyses whether TLP is valid or not.
3. Whenever a Transaction is starting, Transaction Layer will assemble Header and then the Payload.
Header is part of the data that has all the fields required to classify the data. Header holds data
like Address, Requestor ID, Completer ID, Device ID, Bus number, Tag, Length etc. Length in
decoded format.
4. PCIE Decode Utility will decode the Transaction Layer Packets to obtain Length of TLP, Address,
TLP format, TLP Type, Header, Data (if applicable), Requestor ID, Completer ID, Device ID, Bus
number, Tag and other details hidden in the Header.
5. When Memory or I/O read request type of TLP transaction decoded, which will not be having
data payload. Because Root complex request to read memory. But in memory Write will be having
data payload.
6. But root complex will be expecting TLP from End point after the memory read request, in that
Completer TLP data payload will encrypted by End point devices.
7. PCIE Decoder extracts data payload from requester transaction, only when memory write request
and I/O write requests.
29
5. PCIE Enumeration:
5.1 Introduction
“PCI Express (PCIe) utilizes a point to point interconnect and uses switches to fan out and expand the
number of PCIe connections in a system. Upon system boot up a critical task is the discovery or
enumeration process of all the devices in the PCIe tree so they can be allocated by the system software.
During the enumeration process the system software discovers all of the switch and endpoint devices that
are connected to the system, determines the memory requirements and then configures the PCIe devices.
The PCIe switch devices represent a special case in this process as their configuration is unique and
separate from that of PCIe endpoints. In the simulation testbench environment however, only their
configuration is required; the discovery process is not strictly necessary as the number of PCIe devices are
known ahead of time. This paper will elucidate the process of switch configuration using Xilinx' PCI Express
simulation class libraries.
First, while the discovery process is not needed within the testbench environment, the testbench must
still select the bus numbers and memory address of all the devices.”
30
5.2 Setting up with ARM DEV Studio
Gather Equipment:
▪ Windows PC
Download Software:
▪ Python
▪ Pip
▪ PuTTY
31
▪ Plug the ARM D-Stream's USB output to the PC. Plug the PC into the DBG USB output of
the N1SDP
Open Arm Development Studio and click “New Debug Connection”, selecting the following
options:
32
For the N1 SDP, the PCIe root configuration space, endpoint configuration space, and endpoint
memory mapped IO are not all contiguous. To remedy this, the SCP (system control processor,
which boots before the application processor) builds a BDF. A "segment" is a term used to refer
to a PCIe root. That is, if there are 2 roots, then there are 2 segments (seg 0 and seg 1). For the
N1SDP there are 2 roots. The first root (segment 0) is for PCIe slots 0 through 3. Segment 1 is for
PCIe slot 4 (the CCIX port).
33
34
4.4 Results
Encrypted Input Data:
35
Decoded Output:
1. Memory Write Request TLP
36
2. Memory Read Request TLP
37
PCIE Enumeration Result:
38
39
6.References
1. https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-
trm.pdf
40
2. https://www.xilinx.com/support/documentation/user_guides/ug1137-zynq-
ultrascale-mpsoc-swdev.pdf
3. Interactions of Zynq-7000 devices with general purpose computers through PCI-express:
A case study -- https://ieeexplore.ieee.org/document/7495400
4. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_4/ug1138-
generating-basic- software-platforms.pdf
5. Ready PCIe Data Streaming Solutions for FPGAs --
https://ieeexplore.ieee.org/document/6927444
41