Professional Documents
Culture Documents
S O L U T I O N S F O R A P R O G R A M M A B L E W O R L D
Drone Platform
Soars with the
34
Zynq SoC
www.xilinx.com/xcell
Lifecycle Technology
Software-Defined Radio
from Concept to Production
Product Benefits/Features:
Combines the Analog Devices AD9361 integrated RF Agile Transceiver with the Xilinx Z7035
Zynq-7000 All Programmable SoC
Running Linux on the dual core ARM A9, the PicoZed SDR provides a full software environment
which can be used from prototyping to production
Userspace I/O (UIO), Industrial I/O (IIO) subsystems and drivers
Supports Avnet camera modules with V4L2 drivers for realtime video capture
Frequency-agile 2x2 receive and transmit paths with independent LO from 70 MHz to 6.0 GHz
Handheld form-factor, ideal for a broad range of fixed and mobile SDR applications
Full support in MATLAB and Simulink
The Expanding
PUBLISHER Mike Santarini
mike.santarini@xilinx.com
Xilinx Ecosystem
408-626-5981
This issue of Xcell Journal brings you news of a radical expansion in the ecosystem of
companies supporting systems development based on Xilinx Zynq-7000 SoCs and
EDITOR Jacqueline Damian
Zynq UltraScale+ MPSoCs. Xilinx announced this expansion at the end of February
ART DIRECTOR Scott Blair
at the Embedded World conference, held in Nuremberg, Germany. The Xilinx booth at
the show offered demos of many applications created by Xilinx as well as by Xilinx
DESIGN/PRODUCTION Teie, Gelwicks & Associates
Chances are good that your next design will resemble one or more of these demonstra-
tions, which means that the Xilinx and ecosystem IP used in the demos could easily apply
to your design too. This issues cover story, Xilinx Extends Ecosystem to Reshape the
www.xilinx.com/xcell/
Future of Embedded Vision, IIoT System Design, by Aaron Behman and Dan Isaacs,
starting on page 8, gives you a lot more detail about this important ecosystem expansion.
Xilinx, Inc.
2100 Logic Drive
San Jose, CA 95124-3400
Phone: 408-559-7778
FAX: 408-879-4780
www.xilinx.com/xcell/
Xcell Journal and Xilinx owe a great debt to Mike Santarini, who served
our readers. Xilinx makes no warranties, express,
Kudos and Goodbye to Mike Santarini
implied, statutory, or otherwise, and accepts no liability
as the publisher of this magazine for the past eight years. Mike has
with respect to any such articles, information, or other
significant. Mike, all of us here at Xilinx say Thank you! and wish
information in any way releases and waives any claim
it might have against Xilinx for any loss, damage, or
14 Xcellence in Communications
22
Evaluating an IQ Compression
Algorithm Using Vivado
High-Level Synthesis 22
Xcellence in Wireless
34
Communications
Xilinx FPGA Enables Scalable
MIMO Precoding Core 28
Xcellence in Aeronautics
Drone Platform Soars
with the Zynq SoC 34
Cover
Story
Xilinx Extends Ecosystem
to Reshape the Future
of Embedded Vision, IIoT
System Design
8
FIRST QUARTER 2016, ISSUE 94
Tools of Xcellence
FMC+ Standard Propels
Embedded Design to New Levels 52
46
52
XTRA READING
Xtra, Xtra The latest Xilinx
tool updates and patches,
as of March 2016 58
Dan Isaacs
Director, Industrial IoT,
Corporate Strategy & Marketing
Xilinx, Inc.
dani@xilinx.com
C/C++ users are more accustomed video but also encompasses a long systems often combine video streams
to writing code for CPUs and, more list of additional sensed parameters, from four, five, six or more video
recently, GPUs. With Xilinxs Vivado including acceleration and vibration; cameras to produce a single birds-
High-Level Synthesis (HLS) for soft- acoustic/ultrasonic; chemical and gas; eye view that gives the driver 360
ware-defined hardware and SDx electric/magnetic; flow; force, load, 2D planar or 3D spherical vision.
environment for software-defined torque and strain; humidity and mois- Vision systems drive local displays
systems development, many more ture; leak and level; machine vision; but also send locally processed vid-
system developers can make use of optical; motion, velocity and displace- eo to the cloud for further process-
the software-defined All Programma- ment; position, presence and proximity; ing, for combination with other video
ble devices in the Xilinx Zynq-7000 pressure; and temperature. streams and for storage.
SoC and Zynq UltraScale+ MPSoC IIoT systems may combine video
families. With its ecosystem expan- RISING NEED with additional sensed data to define
sion, Xilinx is making its offerings FOR SENSOR FUSION needed actions. For example, the new
just as easy to use as CPUs and GPUs, Several embedded vision and IIoT CPPS-Gate40 Smart Gateway from Sys-
but with superior performance and systems require sensor fusion, or tem-on-Chip engineering (SoC-e; see
performance/watt. the processing and merging of data article, page 14 ) incorporates a variety
The pipelines for embedded vision from multiple and different types of I/O ports commonly used in industri-
and IIoT systems have much in com- of sensors into actionable intelli- al control systems, combined with local,
mon. Both start with sensing and data gence. For embedded vision sys- high-speed data processing, and plac-
acquisition. For embedded vision tems, multiple video streams may es the resulting data on a dual-redun-
systems, that data takes the form of be combined to produce more- dant optical Ethernet ring using High-
a series of images or a video stream. usable or more-useful video streams. Availability Seamless Redundancy/
Sensed data for IIoT systems includes For example, vehicle-based vision Parallel Redundancy Protocol (HSR/
Vehicle
Sensors
LPDDR4 16nm
Figure 1 This ADAS design leverages the heterogeneous processing capabilities of the ARM Cortex cores in the Zynq UltraScale+ MPSoC.
PRP). A defining characteristic of IIoT Scale+ MPSoCs allow you to create interfaces and protocols. In addition,
systems is the ability to use sensed data reusable, software-defined hardware the Zynq UltraScale+ MPSoCs superior
for high-speed, real-time control not platforms with configurable and ex- hardware-based video-processing per-
possible if relying on cloud-based pro- tensible range both up and down the formance allows it to handle more vid-
cessing and decision-making. end-product family cost curve, from eo channels than competing standard
Of course, there are many alternative low-cost to high-performance sys- devices. Unlike such devices, the Zynq
ways to design such systems using a CPU tems, and to extend a brand into new UltraScale+ MPSoC handles a program-
or GPU, but Xilinx Zynq-7000 SoCs and markets across multifunction prod- mable number of video streams.
Zynq UltraScale+ MPSoCs offer several uct lines. This is not a hypothetical Because of the Zynq UltraScale+
significant advantages and benefits when advantage; many Xilinx customers MPSoCs I/O flexibility and processing
youre designing differentiated systems: are already putting it into practice. power, very little hardware is needed
outside of the MPSoC itself, except for
1. Highest performance/watt. Xilinx
Here are four examples of Chame- the sensors and external memory. The
All Programmable devices combine
leon All Programmable platforms, all performance/watt metric for this sys-
hardware, software and I/O program-
leveraging the Xilinx Zynq UltraScale+ tem is approximately 3x better than for
mability, letting you collapse a two-,
MPSoC to target distinct markets. a comparable system using CPU-based
three- or four-chip design into one
silicon from a leading competitor.
chip to maximize system performance
EXAMPLE 1: ADVANCED
while lowering power consumption.
DRIVER ASSISTANCE SYSTEM EXAMPLE 2:
2. Sensor fusion. Xilinx All Program- An advanced driver assistance system 4K VIDEO SURVEILLANCE
mable devices offer unique abilities to (ADAS) combines video data from A Zynq UltraScale+ MPSoC connects
ingest and process multiple types of several video cameras and additional to multiple sensors, including differ-
information, ranging from low-bit-rate vehicle sensor data, including inertial ent types of video cameras, in the 4K
data, such as temperature and pres- navigation and even GPS map data, to multichannel, multisensor video sur-
sure, to high-bit-rate data, including make decisions about braking, steering veillance system shown in Figure 2.
multiple, simultaneous high-definition and driver alerts. The block diagram in The red boxes in the block diagram
or super-high-definition video streams. Figure 1 shows a typical ADAS design again depict Xilinx interface IP for
based on a Zynq UltraScale+ MPSoC. MIPI-interfaced video cameras and
3. Any-to-any connectivity. The pro-
As Figure 1 shows, this design lever- displays and for different I/O inter-
grammable I/O capabilities of Xilinx
ages the heterogeneous processing faces that connect other sensor types.
Zynq-7000 SoCs and Zynq Ultra-
capabilities of the quad-core ARM Cor- The six all-blue boxes depict process-
Scale+ MPSoCs offer an unmatched
tex-A53 application processor and du- ing IP available from Xilinx ecosystem
ability to adapt to nearly any con-
al-core ARM Cortex-R5 real-time pro- companies. The two red/blue boxes
ceivable sensor I/O requirement,
cessor in the Xilinx Zynq UltraScale+ indicate IP blocks available both from
from multiple video interface stan-
MPSoC. The five red boxes in the di- Xilinx and from companies in its ex-
dards (such as MIPI and HDMI) to
agram depict MIPI video-interface IP panded ecosystem.
intelligent sensor interfaces (such
available directly from Xilinx. The six The performance/watt metric for
as I2C and SPI) and high-speed A/D
blue boxes show high-speed IP process- this Chameleon All Programmable sys-
converters (including JESD204B
ing blocks provided by other companies tem is approximately 5x better than for
and LVDS).
in the Xilinx ecosystem that implement a comparable system designed with
4. Multilevel security and multilayer high-level functions, including pedestri- CPU/DSP/GPU-based silicon from a
safety. The Zynq UltraScale+ MPSoCs an detection, driver monitoring, lane- leading competitor. The safety and se-
quad-core ARM Cortex-A53 appli- departure monitoring, blind-spot detec- curity features of the Zynq UltraScale+
cation processor and dual-core ARM tion and sensor fusion. MPSoC, including ARM TrustZone
Cortex-R5 real-time processor with The depicted ADAS system takes full capabilities and the devices hard-
hardware security features offer unique advantage of the Zynq UltraScale+ MP- ware-based AES encryption, are espe-
abilities to implement security and func- SoCs any-to-any connectivity to com- cially useful in security applications
tional-safety protocols. municate with any sensor interface, like this one.
including MIPI for the video cameras.
5. Chameleon All Programmable
Nonprogrammable devices from com- EXAMPLE 3: SMART-GRID
platforms. The hardware and soft-
peting vendors cannot easily adapt to SUBSTATION AUTOMATION
ware processing and I/O flexibility
new sensor interfaces without adding Our third example, a substation automa-
of Zynq-7000 SoCs and Zynq Ultra-
I/O chips to handle the additional I/O tion system targeting smart-grid design,
First Quarter 2016 Xcell Journal 11
COVER STORY
16nm
DDR4 DDR4
is an IIoT application that deals with 62439 HSR/PRP. It does so through a of the Zynq UltraScale+ MPSoCs six
multiple Ethernet streams from var- compatible industrial Ethernet switch ARM processor cores, depending on
ious sensing components that moni- instantiated in the Zynq UltraScale+ performance requirements.
tor substation parametrics. A system MPSoCs programmable logic using The performance/watt metric for
block diagram for this Chameleon All IP sourced from SoC-e, a company in this system is approximately 1.2x
Programmable system example ap- the Xilinx ecosystem. This Ethernet better than for a comparable system
pears in Figure 3. switch appears as the large blue box based on CPU/DSP silicon from com-
A key feature of this example in the diagram. Data from the various petitors, and theres a 2:1 reduction
IIoT system is its ability to connect sensor sources can be processed in in the number of chips needed in this
to a large number of interface units high-speed IP blocks (represented by design, thanks to the Zynq UltraScale+
throughout the substation over stan- the red/blue box in the diagram) from MPSoCs massive programmability, pro-
dard industrial Ethernet systems Xilinx and from companies in the cessing capacity and superior I/O flexibili-
using standardized IEEE-1588 Preci- Xilinx ecosystem, or the processing ty. Clearly, a security application must pro-
sion Timing Protocol (PTP) and IEC algorithms can run on one or more tect the power grid from malicious attack,
GTP Power, SD
Memory System Safety Connectivity Card
Controller Controller
GTP Ethernet System
Management Peripherals
Switch WiFi
GTP A53 A53 I/O
Configuration
IEEE 1588 Security I/Os
GTP A53 A53
Zynq UltraScale+ Hardened Feature
HSR/PRP GigE
Time
GTP (High Availability) R5 R5 GPU GigE Sync Ecosystem Provided IP
IEC 61850
GTP Xilinx IP
Security/Transport
GTP
Sensor Fusion Imaging Industrial Ethernet External Component
Acceleration
GTP Sensor Engines
Monitoring VCU
(A/D)
Application Accelerators
Figure 3 In this smart-grid substation automation system, an industrial Ethernet switch instantiated in
the Zynq UltraScale+ MPSoCs programmable logic sources IP from SoC-e, a company in the Xilinx IIoT ecosystem.
Power, SD
Functional Safety Memory System Safety Connectivity Card
Controller Controller System
Networking Management Peripherals Display
GTP Fast cycle time, A53 A53 I/O
low latency Configuration
Security I/Os
A53 A53
GTP Motion Control GigE Zynq UltraScale+ Hardened Feature
Time
R5 R5 GPU GigE Sync
Pattern matching Ecosystem Provided IP
GTP Xilinx IP
Custom accelerators HMI Industrial
Sensor Fusion Acceleration Ethernet
Predictive Maintenance External Component
GTP
Productivity Sensor
Monitoring Real-Time Analytics Safety
(A/D) Accelerators Monitoring
Application Accelerators
Figure 4 This industrial automation design for IIoT uses the Xilinx Zynq UltraScale+ MPSoC
to integrate an entire system that might otherwise require four chips.
Intelligent Gateways
Make a Factory Smarter
by Armando Astarloa
Founding Partner and CEO An intelligent
System-on-Chip engineering S.L.
armando.astarloa@soc-e.com gateway powered
by the Zynq SoC
enhances productivity
in a state-of-the-art
manufacturing plant.
T
he Industrial Internet of
Thingsthe idea that all
systems should be con-
nected on a global scale
in order to share informa-
tionis quickly becoming
a reality. Today, a growing
number of companies, es-
pecially in the industrial equipment markets,
are taking IIoT one step further by creating
complex systems that integrate sensors, pro-
cessing and communications to form intelligent
factories, smart energy grids and even smart
cities. These developments increase productivi-
ty and profitability, as well as enrich lives.
New technology implemented on a Xilinx
Zynq-7000 All Programmable SoC is helping to
bring intelligent systems into the manufacturing
sector of the IIoT. The smart gateway, designed by
System-on-Chip engineering S.L. (SoC-e), stream-
lines productivity and helps companies like Micro-
deco become more reliably connected and secure.
To maximize profitability, factories seek more
flexibility in their layouts, more information
about the process and manufactured products,
more intelligence in the processing of this data
and an effective integration of the human expe-
rience/interaction. However, as new technology
is introduced into the factory sector, those cre-
ating it need to respect some rules. The first and
most important is that production cannot stop.
New technologies must be compatible with old
systems and interoperability among vendors
should be facilitated. Furthermore, the solutions
should provide a means of taking the next step
in automation, leading to more autonomous or
decentralized analytics.
First Quarter 2016 Xcell Journal 15
XCELLENCE IN INDUSTRIAL
HOURS
SHIFT
MILLISECONDS, SECONDS
and industrial switches built around ture around the concept of smart gate- used industrial protocols such as Mod-
multiple protocols. As such, a factory ways that combine in the same system bus and Profibus.
can quickly turn into a heterogeneous networking, processing and sensing. Figure 3 shows how each smart gate-
nightmare, lacking the simplicity and One of the top challenges in creating way installed in each machine (CPPS
flexibility that a plug-and-work opera- a smart factory lies in connecting the area) is tied to the next one using a sin-
tion demands. Intelligent gateways like various systems. The factory includes gle fiber-optic link. The infrastructure is
the CPPS-Gate40 from SoC-e (Figure 2) high-speed optical links that intercon- completed by connecting all the devic-
will play a vital role in offering secure nect the various cyber physical produc- es in a single ring that implements the
and transparent operation between tion system (CPPS) areasthat is, each High-Availability Seamless Redundancy
both worlds (machine and IT). production group of machines, sensors (HSR) protocol. This nonproprietary
Microdeco is a company that manu- and actuators. The intelligent gateway (IEC 62439-3 Clause 5) Ethernet zero-de-
factures small metal parts for the auto- is in charge of all the communication lay recovery time solution allows oper-
motive sector. The company is always infrastructure. This includes, the high- ators to disconnect any equipment from
looking for ways to enhance productiv- speed switching for the fiber links and the ring without adversely affecting other
ity and is at the forefront of using intel- flexible, trispeed Ethernet ports to im- nodes or equipment in the factory. This
ligent systems. In the companys pilot plement regular Ethernet or Industrial real plug-and-work operation facilitates
plant, located in Ermua, Spain, Micro- Ethernet protocols in each cell, along plant layout modifications. Furthermore,
deco has built a networking infrastruc- with serial ports to implement widely HSR supports the redundant IEEE 1588v2
First Quarter 2016 Xcell Journal 17
XCELLENCE IN INDUSTRIAL
submicrosecond synchronization proto- mends a cut-through approach for for- stations. Other sectors, such as military
col, which simplifies the synchronization warding the frames in the ring. and aerospace, are also adopting these
of the system to perform precise recon- To avoid circulating frames, for uni- Layer 2 solutions.
struction of the sampled sensor data or cast communications the node that Smart gateways provide hardware
the implementation of control tasks. receives the frames is in charge of re- switching from the Ethernet and seri-
In order to provide seamless redun- moving them from the ring. For multi- al ports to the HSR infrastructure ring.
dancy, each HSR node sends the Ether- cast and broadcast traffic, the sender There are two smart gateways, repre-
net frames through both directions of removes the frames when it sees them sented in the left and in the right of Fig-
the ring. This approach allows hot ca- again in the redundant port. Addition- ure 3, that connect the HSR ring with
ble or equipment plugging and unplug- al rules regarding circulating frames the Ethernet-based enterprise network
ging. Each node is in charge of forward- (such as corrupted frames) are applied working as a redundancy box (Red-
ing both frames, and the IEEE 1588v2 to ensure network stability. Box). Functionally, the access point
support corrects the residence and link HSR, combined in many cases with represented on the right is optional,
delay times to ensure timing accuracy the Parallel Redundancy Protocol as it can be used to avoid the single
in the entire network. Thus, frame hard- (PRP), is the recommended High-Avail- point of failure that would appear in
ware processing is mandatory to ensure ability Ethernet protocol in the standard the case of a network using only one
low and constant latency times in every for the automation of one of the most RedBox. We recommend implement-
node. Indeed, the IEC standard recom- critical sectors worldwide: power sub- ing the dual-box setup in cases where
SENSORS SENSORS
6.6.6a
6.6.3a
6.6.4a
6.6.5a
ISLA
P5 CPPS-Gate40 CPPS-Gate40 CPPS-Gate40 CPPS-Gate40 CPPS-Gate40 P5
192.168.2.10 192.168.2.11 192.168.2.12 192.168.2.13 192.168.2.14
10/100/1000 BaseT
Ethernet
Service port
ISLA
CPPS-Gate4 CPPS-Gate4
RedBox 1 RedBox 2
192.168.2.100 192.168.2.200
(optional)
Enterprise
Servers
P5 CPPS-Gate40 CPPS-Gate40 CPPS-Gate40 CPPS-Gate40
192.168.2.18 192.168.2.17 192.168.2.16 192.168.2.16
ERP/MES Enterprise
Internet
MD 20.
Servers
MD 20.
MD 20.
MD 20.
Access ERP/MES
Point 1 Internet
SENSORS SENSORS SENSORS SENSORS Access Point 2
(optional)
External Memories
SERIAL
INTERFACES ARM CORTEX-A9 High Speed
10/100/1000 Acquisition
BaseT GMAC1 DSP
Logic System;
Service port GMAC0 Vibration
PTP sw stack
Profinet sw stack Sensors
Linux OS
AXI4 PLC
Database/sensor proc
Timestamp Value
IEEE 1588PTBIP
MODBUS/PROFIBUS PORT
C
PORT
LOW SPEED SENSORS A 1GE
10/100/1000 PORT HSR
BaseT D RING
Service port PORT
B
IEEE 1588v2 aware
HSR/PRP/Ethernet Switch IP
high availability is needed, or when it The cybersecurity requirements in works. The trusted embedded system is
is necessary to manage PRP frames these kinds of advanced manufactur- more and more difficult to protect due
(IEC 62439-3 Clause 5) in the critical ing facilities vary widely. Advanced to the increasing number of devices and
nodes in the enterprise network. security is necessary to protect the their heterogeneity.
Additionally, there are internal status of the production itself, avoid- For authentication and for network-
networking ports in the gateway to ing any malicious or accidental inter- ing security, these systems can directly
the processing elements of the SoC ruption generated by any cyber infra- use many of the solutions present in
device. In most cases, a dumb structure (device, network, software the IT world today. Well-known au-
switching approach is useless to join or hardware). It is also necessary to thentication mechanisms like IEEE
plant and IT worlds. The heterogene- authenticate users and devices that are 802.1X combined with RADIUS are a
ity in the data and network formats accessing information or any critical good example. Many embedded sys-
makes straightforward connections operation. Furthermore, this informa- tems with high-level operating systems
difficult. Whats needed is a pow- tion and the control protocols need to can run cryptographic libraries (such
erful integrated processing system be protected in terms of authentication as OpenSSL) to support all the secure
able to talk with local, enterprise or and privacy, because factory networks Layer 3 protocols and applications use-
cloud databases. In addition, such a are connected to larger IT networks in ful for secure data interchange. How-
system would be in charge of trans- an enterprise and outside of it. ever, a big challenge arises when it is
lating protocols, managing HMI sys- These challenges can only be ad- necessary to secure Layer 2 industrial
tems, supporting MES systems and dressed with a layered cybersecurity protocols with strict real-time require-
even running soft PLCs for real-time approach that takes into account each ments. The analysis of these scenarios
control. But that is not all. The cus- plant implementation. A common ele- shows that the software approach of
tomer also expects such a system ment in all the projects is the need to protecting these frames by applying
to perform complex sensor data support secure boot and storage with cryptographic algorithms, even using
preprocessing and filtering in the encryption and authentication. This fea- crypto accelerators, is not straightfor-
equipment, and of course, advanced ture will make credible the implementa- ward, and in many cases custom hard-
cybersecurity operations. tion of secure software and secure net- ware processing is required.
In the presented topology, from the HOW SOC PROGRAMMABLE low-latency networking tasks combined
network and user point of view, it is PLATFORMS DRIVE THE CHANGE with the IEEE 1588v2 hardware support
necessary to secure three network The magic of merging high-end net- units. Figure 4 is a block diagram of
linksthe redundant HSR/PRP, the working, powerful processing and the SoC implementation for the CPPS-
10/100/1G switching port and the ser- sensing capabilities has been obtained Gate40 in the Microdeco implementa-
vice portswith authentication mech- thanks to SoC programmable plat- tion. The networks switching infrastruc-
anisms. Furthermore, due to all the forms. Our product, named CPPS- ture is coordinated by the SoC-e HSR/
plant traffic passing through the intel- Gate40, embeds a Xilinx Zynq-7000 All PRP/Ethernet switch (HPS) IP core,
ligent gateway, the three links will play Programmable SoC device implement- which ensures a constant forwarding
a vital role in monitoring traffic for po- ed on the SoC-e SMARTzynq OEM mod- time of 550 nanoseconds in each node of
tential threats. ule. The dual-core ARM Cortex-A9 the ring and integrates internal and ex-
A final concept is the integration of MPCore on the device is comple- ternal trispeed Ethernet ports.
a sensor interface suite. As discussed, mented with different memory resourc- The internal port is sniffed and
the advances in the technology should es (DDR3, flash, massive storage units, time-stamped by the Precise Time Ba-
help us to simplify the installations, not etc.) and hardware to support multiple sic (PTB) IP core, providing support
make them more complex. To fulfill high-speed networking links. This infra- for the PTP stack. This IEEE 1588v2 in-
this demand, we integrated all the stan- structure offers a huge level of freedom frastructure allows the smart gateway
dard digital and analog interfaces in the to partition hardware and software pro- to work as master, slave, transparent
gateway. Additionally, we also included cessing in order to face the challenges clock and boundary clock. Thus, at the
high-end interfaces for advanced vibra- these applications present. end, in each piece of equipment a syn-
tion sensors and high-speed data acqui- From the hardware perspective, the chronized 64-bit timer can be used for
sition interfaces with direct access to Zynq SoCs programmable logic is the time-stamping, synchronization, con-
the Zynq SoC device. perfect candidate to implement the trol and as a common time reference
IPIPInterface
Interface Time
Time Capture
Capture Sensor
Sensor
library
library driver
driver driver
driver Interface
Interface
SWSW
HW
HW
HPS/Eth. Switch/PTB Sensors
HPS/Eth. Switch/PTB Sensors
Evaluating an IQ Compression
Algorithm Using Vivado
High-Level Synthesis
by Stefan Petko
Design Engineer
Xilinx, Inc.
stefan.petko@xilinx.com
Duncan Cockburn
Design Engineer
Xilinx, Inc.
duncan.cockburn@xilinx.com
W
ireless network are most frequently implemented using
operators face a the Common Public Radio Interface
major challenge (CPRI) protocol over optical fiber. The
in maintaining CPRI protocol requires a constant bit
the bottom line rate and its specification has over the
while increasing years increased the maximum data rate
the capacity and density of their net- to match the increasing bandwidth de-
works. A compression scheme for wire- mands. Network operators are now
less interfaces can help by reducing the looking at technologies that will allow
required fronthaul network infrastruc- them to achieve a significant hike in
ture investment. data rate without increasing the number
We used the Vivado Design Suites of optical fibers in use, thus maintaining
high-level synthesis (HLS) tool to current capex and opex overheads as-
evaluate an Open Radio equipment In- sociated with a cell site.
terface (ORI) standard compression In an effort to provide a long-term
scheme for E-UTRA I/Q data to esti- solution, the network operators are
mate its impact on signal fidelity, the looking at alternative network arrange-
introduced latency and its implemen- ments including rearchitecting the in-
tation cost. We found that Xilinxs terface between the baseband-process-
Vivado HLS offered an efficient plat- ing and radio units in order to reduce
form for evaluating and implementing the fronthaul bandwidth. However,
the selected compression algorithm. functional rearrangements can make it
more difficult to meet the stringent per-
WIRELESS BANDWIDTH PRESSURE formance requirements for some wire-
The ever-increasing demand for wire- less interface specifications.
less bandwidth drives the need for An alternative way to reduce band-
new network capabilities such as high- width is to implement a compression/
er-order MIMO (multiple-input, multi- decompression (codec) scheme for
ple-output) configurations and carrier wireless interfaces that are nearing or
aggregation, for example. The result- exceeding the available throughput. The
ing increase in network complexity achievable compression ratios depend
leads the operators to architectural on the specific wireless-signal charac-
changes such as the centralization of teristics such as noise levels, dynamic
baseband processing to optimize their range and oversampling rates.
network resource utilization. While Lets take a closer look at an ORI
reducing baseband processing costs, standard compression scheme for
the sharing of baseband processing E-UTRA IQ datathe real and imag-
resources increases the complexity of inary components of the transmitted
the fronthaul network. modulation symbols. A simplified ap-
These fronthaul networks, trans- plication, shown in Figure 1, illustrates
porting the modulated antenna carri- the placement of a compression-and-de-
er signals between the baseband units compression module at the CPRI IQ in-
(BBU) and remote radio heads (RRH), put-and-output interface.
IQ COMPRESSION ALGORITHM bandwidth of an OFDMA output for lookup table, I and Q samples can be
The ORI standard is a refinement of a 20-MHz E-UTRA downlink channel quantized independently using their
the CPRI specification aiming to en- sampled at 30.72 MHz is 18.015 MHz, corresponding distribution functions.
able an open BBU/RRH interface. In implying a lossless ideal low-pass filter We compared the ORI IQ compres-
its latest release, ORI specifies a lossy response at 3/4 resampling rate. sion performance to a Mu-Law com-
time-domain E-UTRA data compres- The nonlinear quantization (NLQ) pression algorithm implementation
sion technique for channel bandwidths process transforms 15-bit baseband as specified by the ITU-T Recom-
of 10, 15 or 20 MHz. The combination IQ samples with normal distribution mendation G.711. Also a nonlinear
of a fixed 3/4 rate resampling and non- into 10-bit quantized values. NLQ quantization technique, Mu-Law uses
linear quantization of 15-bit IQ sam- minimizes the quantization error by a logarithmic function to redistrib-
ples achieves a 50 percent reduction in using a cumulative distribution func- ute the quantized values across the
bandwidth requirements, facilitating tion (CDF) with a specified standard available number range. Unlike CDF-
an 8 x 8 MIMO configuration cover- deviation to represent amplitudes oc- based quantization, which considers
ing two sectors over a single 9.8-Gbps curring more frequently with a finer the statistical distribution of the input
CPRI link, for example. granularity than amplitudes occurring samples, Mu-Law quantized output is
The resampling stage involves in- less frequently. As illustrated by our a function of the corresponding input
terpolating the input I and Q streams, results in Figure 2b, the quantized sample value and the specified com-
passing the interpolated data through constellation fills a significantly high- panding value.
a low-pass filter and decimating the er proportion of its reduced number To make the comparison for an
output data stream. The filter design range than the input constellation equivalent compression ratio of 50 per-
process exploits specific channel char- shown in Figure 2a, minimizing the cent, we considered a 16- to 8-bit Mu-
acteristics in order to minimize signal quantization error in comparison Law encoder. Because it does not re-
loss due to downsampling and upsam- with alternative linear quantization quire resampling, Mu-Law compression
pling stages. For example, the effective schemes. Typically implemented in a is a cheaper solution in terms of latency
Downlink
I/Q I/Q
CPRI
L3 MAC PHY
I/Q I/Q
Uplink
I/Q Input Constellation (15-bit I/Q Samples) Rescaled Compressed I/Q Constellation
16384
Quadrature Amplitude
Quadrature Amplitude
0
0 16384
In-Phase Amplitude In-Phase Amplitude
Figure 2 IQ constellation of the reference 20-MHz E-UTRA DL channel input frame (a) and the compressed
IQ data (b) scaled to illustrate the effective number range utilization for each constellation
and implementation resource cost, of- lowed us to ignore the noncontributing levels can be computed independently
fering a trade-off between the design input samples. The output stream be- for I and Q samples using the expected
complexity and the achievable recon- comes a function of the interpolation or observed standard-deviation values.
structed-signal fidelity. rate number of subfilters, each using Alternatively, subsets of channels can
a subset of the FIR coefficients (total be quantized independently based on
SCALING UP THE CODEC number of coefficients/interpolation the actual signal-level measurements
ARCHITECTURE rate). Each of the four parallel filters or higher-level network parameters.
For our prototype configuration, we operates on a subset of channels, mak- For decompression, we used the quan-
aimed to scale up the compression al- ing the overall throughput equal to the tile function (inverse CDF) to compute
gorithm to fully utilize a 9.8304-Gbps required three compressed samples the Inverse NLQ table. Its size is limit-
CPRI link (line bit rate option 7). The per clock cycle. Along with the high ed to 29 entries of 14-bit values.
ORI-compressed E-UTRA sample spec- throughput, the proposed architecture We tested the implemented co-
ification allows us to transport 16 com- reduces the resampling latency, since dec algorithm using a 20-MHz LTE
pressed IQ channels (32 I and Q channels only a fraction of coefficients is used E-UTRA FDD channel stimulus gen-
compressed independently) over a sin- in each subfilter. erated with MATLABs LTE System
gle 9.8G CPRI link. A target throughput On the compression path, we com- Toolbox. We then used Keysight VSA
of three compressed samples per CPRI puted the NLQ quantization table using to demodulate the captured IQ data
clock is sufficient to fully pack the 32-bit the cumulative distribution function and quantify the signal distortion due
Xilinx LogiCORE IP CPRI IQ interface, (CDF). Assuming a symmetric IQ dis- to compression and decompression
giving us the required 737.28-Msample/ tribution, we reduced the size of the stages by measuring the output wave-
second compression IP output. NLQ lookup table to 214 entries of 9-bit form error vector magnitude (EVM).
Targeting a single clock domain, quantized values. Because our design We compared the reported output
we needed to architect the resampling required three parallel lookups per EVM measurementswhich repre-
filter to meet the output rate of three clock cycle, we implemented three par- sent the difference between the ideal
samples per clock cycle. Interpolating allel lookup tables utilizing the same and the measured signalto the refer-
the input sample stream with 0s al- quantization values. The quantization ence input signal EVM.
HIGH-LEVEL MODELING AND the GNU Octave language, utilizing its coefficients and quantization tables.
IMPLEMENTATION FLOW signal-processing and statistics packag- From a high-level mathematical mod-
We started the implementation process es. In addition to providing useful verifi- el, the Vivado HLS tool offers an ob-
by developing single-channel compres- cation reference data points, the model vious transition path to evaluate the
sion and decompression models using output also generated a set of FIR filter proposed architecture in terms of the
Multi-Channel Parallel
(9.8Gbps)
Parallel FIR
16 x 16
CPRI IP
Downlink
LTE PHY
Filter
Multi-Channel
Resampling Filter
Multi-Channel
Resampling Filter
4x15-bit
I/Q Compression IP 3x10-bit
I/Q Samples I/Q Samples
(CPRI Rate) (CPRI Rate)
Multi-Channel Parallel
Resampling Filter Inverse
8 Ch. 4/3 Interpolation NLQ
(CPRI IP I/Q I/F Compressed Samples)
I/Q Data Unpacker
Parallel FIR
(9.8Gbps)
Analog Front End
Digital Front End
CPRI IP
Inverse NLQ Look-up Table
Filter
2^9x14-bit I/O Values
RF
Multi-Channel
Resampling Filter
Multi-Channel
Resampling Filter
Multi-Channel
Resampling Filter
I/Q Decompression IP
Figure 3 IQ codec architecture indicating sample processing rates at the codecs IP interfaces (downlink path only)
potential hardware performance and PERFORMANCE MEASUREMENTS of design and verification tools. The
cost. We set up our C++ testbench to The Keysight VSA measurements re- main advantage of such a testbench is
operate on the input data streams using ported an averaged EVM of 0.29 per- the ability to perform fast C-level sim-
the compress and decompress func- cent for a codec configuration with ulations of a hardware system model.
tions. We synthesized these functions 144 FIR coefficients. Compared with For IQ compression and similar appli-
independently, since they would be the original input data with EVM RMS cations, simulation runs involve fre-
physically placed at the opposite ends of 0.18 percent, the additional EVM quent higher-level parameter or input
of a CPRI link. We implemented all of attributable to the compression-de- data set changes, making a fast feed-
the external and internal function inter- compression processing chain is 0.23 back essential.
faces using HLS streams with the inter- percent. By comparison, the Mu-Law Measurements show that the pro-
leaved channel dataflow managed with compression algorithm operating on posed ORI compression scheme can
simple C++ loops. the equivalent input data set results deliver signal distortion below 0.25
We utilized the Vivado HLS FIR IP in an average EVM of 1.07 percent. percent for a 20-MHz E-UTRA down-
to prototype the resampling filter. To The reduced latency and resource link channel. While the compression
meet the high-throughput requirements utilization cost of the Mu-Law com- performance does depend on the input
of our design, we implemented parallel pression would make it a better choice channel characteristics, the ORI com-
single-rate FIR filters and used a loop- over the ORI-proposed IQ compres- pression scheme provides a scope for
based filter output decimation. sion scheme in situations where you performance tuning by choosing the
Its possible to obtain a more re- can allocate as much as 1 percent of optimal set of filter design and quantiz-
source-efficient resampling filter by the entire LTE downlink signal-pro- er parameters.
implementing a polyphase resampling cessing chains 8 percent EVM budget Our prototype utilizes a common
filter. One such ready-to-use alternative to IQ compression. However, any addi- static set of design parameters for all
supporting the ORI resampling rates tional signal distortion implies tighter 16 antenna-carrier data streams. In a
is the Multi-Channel Fractional Sam- performance targets for the remaining real system, the signal characteristics
ple Rate Conversion Filter described system components. The potential IQ would be either known in advance or
in Xilinx application note XAPP1236, compression cost benefits may there- could be measured and used to tune the
Multi-Channel Fractional Sample Rate- fore be offset by cost increases of the design. Alternatively, the compression
Conversion Filter Design Using Vivado- digital front-end components and pow- performance could be dynamically ad-
High-Level Synthesis. er amplifiers, for example. justed by reconfiguring the quantization
The benefits of a fast C-level simula- Vivado high-level synthesis con- tables to maintain the minimal required
tion become even more evident when firmed the required throughput report- signal fidelity.
the verification data sets are large. This ed in terms of the initiation interval The cost in terms of implementation
is very much the case when evaluating the number of clock cycles before the resources and additional latency re-
an IQ compression algorithm since at top-level task is ready to accept new quired to perform compression and de-
minimum, a full radio frame of data input data. We also verified that the ex- compression would also be considered
(307,200 IQ samples per channel) is ported Vivado IP Integrator cores met in conjunction with the required com-
required to make use of the VSA tools the timing requirements for the target pression performance. The resampling
for EVM measurements. We observed Kintex UltraScale platform. filter size and latency dominate the
a simulation speed-up of two orders of We limited the scope of our investiga- overall codec costs, and a greater EVM
magnitude for C simulation compared tion to a small set of configurations and tolerance would allow for a design with
with C/RTL co-simulation, translating input data vectors. However, once the fewer filter coefficients.
to a nine-hour co-simulation run vs. a system model and the corresponding C With time-to-market considerations
five-minute C simulation for our com- model are in place, it will be possible to in mind, Xilinx has created an ORI-
pression IP test run. customize, implement and evaluate an based IQ codec proof of concept. You
Another significant HLS testbench alternative configuration in minutes. can read more about this scheme and
advantage was the ease of input data use request access to the design files on the
and output data capture utilizing files in DESIGN ALTERNATIVES Xilinx website. Visit http://www.xilinx.
conjunction with the HLS streams. The From the design tool perspective, Viva- com/applications/wireless-communi-
result was to provide an interface for do HLS provides a viable hardware pro- cations/wireless-connectivity.html or
data analysis with VSA tools or direct totyping path. A high-level testbench e-mail Perminder Tumber, wireless
comparison against the Octave model fits well within a design framework that business marketing manager for con-
output in the C++ testbench. requires flow of data among a number nectivity, at Permind@xilinx.com.
by Lei Guan
Member of Technical Staff
Researchers at Bell Labs Ireland have built a
Bell Laboratories, Nokia
lei.guan@ieee.org
frequency-dependent precoding core with
high-performance FPGAs for generalized
MIMO-based wireless communications systems.
temporary complex linear convolution function at each branch to provide HIGH-SPEED, LOW-LATENCY
operations and corresponding combin- reasonable precoding performance. PRECODING (HLP) CORE DESIGN
ing operations before radiating NTX RF Essentially, scalability is a key feature
3. The precoding coefficients needed
signals to the air. that must be addressed before you begin
to be updated frequently.
A straightforward low-latency im- a design of this nature. A scalable design
plementation of complex linear convo- 4. The designed core must be easily up- will enable a sustainable infrastructure
lution is a FIR-type complex discrete dated and expanded to support dif- evolution in the long term and lead to an
digital filter in the time domain. ferent scalable system architectures. optimal, cost-effective deployment strat-
egy in the short term. Scalability comes
5. Precoding latency should be as low
SYSTEM FUNCTIONAL from modularity. Following this philos-
as possible with given resource
REQUIREMENTS ophy, we created a modularized generic
constraints.
Under the mission to create a low-latency complex FIR filter evaluation platform in
precoding piece of IP, my team faced a Moreover, besides attending to the Simulink with Xilinx System Generator.
number of essential requirements. functional requirements for a particular Figure 1 illustrates the top-level sys-
design, we had to be mindful of hard- tem architecture. Simulink_HLP_core
1.
We had to precode one data
ware resource constraints as well. In describes multibranch complex FIR fil-
stream into multiple-branch paral-
other words, creating a resource-friendly ters with discrete digital filter blocks in
lel data streams with different sets
algorithm implementation would be ben- Simulink, while FPGA_HLP_core re-
of coefficients.
eficial in terms of key-limited hardware alizes multibranch complex FIR filters
2. We needed to place a 100-plus tap- resources such as DSP48s, a dedicated with Xilinx resource blocks in System
length complex asymmetric FIR hardware multiplier on Xilinx FPGAs. Generator, as shown in Figure 2.
PrecData_ref_1 Data_ref_B1
Din
PrecData_ref_2 Data_ref_B2
Complex Random Source
PrecData_ref_3 Data_ref_B3
Coeff_c128GI To Coeff_G
Sample
PrecData_ref_4 Data_ref_B4
Simulink_HLP_Core
Data_Out_B1
PrecData_1 Data_Out_B2
Din
PrecData_2 Data_Out_B3
PrecData_3 Data_Out_B4
Coeff_c128GIr Coeff_G
PrecData_4 Error Calculation
FPGA_HLP_Core
++ Sel Designer Comments: This HCP IP-core provides a filter-type function: 1 input data
stream will be precoded into 4 data streams by a 4-branch Linear Convolution core,
each of which is complex 128-tap asymmetric FIR filter.
Re In dz-1q DataIn_I
1 Data1_I Input and output of this core is running at 30.72MHz while internal processing is System
performed at 16 times faster rate, i.e., 491.52 MHz. Generator
Din Im In dz-1q DataIn_Q
Data1_Q
Data_pipelined
Table 1 Complex multiplier (CM) utilization comparison for a 128-tap complex asymmetric FIR filter
Different FIR implementation ar- es by sharing the same CM unit with 128 FCLK = FDATANTAPNSS
chitectures lead to different FPGA re- operations in a time-division multiplexing
where NTAP and NSS represent the
source utilizations. Table 1 compares (TDM) manner, but runs at an impossible
length of the filters and number of se-
the complex multipliers (CM) used in clock rate for the state-of-the-art FPGA.
quential stages.
a 128-tap complex asymmetric FIR fil- A practical solution is to choose a
Then we created three main modules:
ter in different implementation archi- partially parallel implementation archi-
tectures. We assume the IQ data rate is tecture, which splits the sequential long 1. Coefficients storage module: We
30.72 Msamples/second (20-MHz band- filter chain into several segmental par- utilized high-performance dual-port
width LTE-Advanced signal). allel stages. Two examples are shown in Block RAMs to store IQ coefficients
The full parallel implementation archi- Table 1. We went for plan A due to its that need to be loaded to the FIR co-
tecture is quite straightforward accord- minimal CM utilization and reasonable efficient RAMs. Users can choose
ing to its simple mapping to the direct-I clock rate. We can actually determine when to upload the coefficients to
FIR architecture, but it uses a lot of CM the final architecture by manipulating this storage and when to update the
resources. A full serial implementation the data rate, clock rate and number of coefficients of the FIR filters by wr
architecture uses the fewest CM resourc- sequential stages thus: and rd control signals.
2. Data TDM pipelining module: stage; and a segmental accumula- time cycles to derive the corresponding
We multiplexed the incoming IQ tion-and-downsample stage. linear convolution results of a 128-tap FIR
data at a 30.72-MHz sampling rate In order to minimize the I/O numbers filter, and to downsample the high-speed
to create eight pipelined (NSS = 8) for the core, our first stage involved streams back to match the data-sampling
data streams at a higher sampling creating a sequential write operation rate of the systemhere, 30.72 MHz.
rate of 30.721288 = 491.52 MHz. to load the coefficients from storage to
We then fed those data streams to the FIR cRAM in a TDM manner (each DESIGN VERIFICATION
a four-branch linear convolution cRAM contains 16 = 128/8 IQ coeffi- We performed the IP verification in two
(4B-LC) module. cients). We designed a parallel read op- steps. First, we compared the outputs of
3. 4B-LC module: This module con- eration to feed the FIR coefficients to the FPGA_HLP_core with the referenced
tains four independent complex the CM core simultaneously. double-precision multibranch FIR core
FIR filter chains, each implement- In the complex multiplication stage, in Simulink. We found we had achieved a
ed with the same partially parallel in order to minimize the DSP48 utiliza- relative amplitude error of less than 0.04
architecture. For example, branch tion, we chose the efficient, fully pipe- percent for a 16-bit-resolution version. A
1 is illustrated in Figure 3. lined three-multiplier architecture to wider data width will provide better per-
perform complex multiplication at a formance at the cost of more resources.
Branch 1 includes four subpro- cost of six time cycles of latency. After verifying the function, it was
cessing stages isolated by registers Next, the complex addition stage ag- time to validate the silicon perfor-
for better timing: a FIR coefficients gregates the outputs of the CMs into a mance. So our second step was to syn-
RAM (cRAM) sequential-write and single stream. Finally, the segmental ac- thesize and implement the created IP in
parallel-read stage; a complex mul- cumulation-and-downsample stage accu- the Vivado Design Suite 2015.1 target-
tiplication stage; a complex addition mulates the temporary substreams for 16 ing the FPGA fabric of the Zynq-7000
Sequential cRAM WR and Parallel cRAM RD Stage Complex Multiplication Stage Complex Addition Stage Segment Accumulation and Downsample Stage
+
4-Branch OUT-B2 OUT-B2
Linear Convolution
DOUT-2 ing clock rate to do a faster job. For
OUT-B3 OUT-C2
CG-2 Compact Core OUT-B4 OUT-D2
the latter case, it would be helpful to
identify the bottleneck and critical
path of the target devices regarding
DIN-3 OUT-C1 OUT-A3
the actual implementation architec-
+
4-Branch OUT-C2 OUT-B3
Linear Convolution
DOUT-3 ture. Then, co-optimization of hard-
OUT-C3 OUT-C3
CG-3 Compact Core OUT-C4 OUT-D3 ware and algorithms would be a good
approach to tune the system perfor-
DIN-4 OUT-D1 OUT-A4
mance, such as development of a
more compact precoding algorithm
+
4-Branch OUT-D2 OUT-B4
DOUT-4
Linear Convolution OUT-D3 OUT-C4 regarding hardware utilization.
CG-4 Compact Core OUT-D4 OUT-D4 Initially, we focused on a precoding
solution with the lowest latency. For our
next step, we are going to explore an
alternative solution for better resource
Figure 4 Example of scaled precoding application based on the 4B-LC core utilization and power consumption. For
more information, please contact the
author by e-mail: lei.guan@ieee.org.
4 x 4 4 128 16 4
4 x 8 8 256 32 8
4 x 16 16 512 64 16
4 x 32 32 1024 128 32
Table 2 Example resource utilization in different application scenarios based on suggested HLP core
Drone Platform
Soars with
the Zynq SoC
Aerotenna achieved its first
ArduPilot-compatible UAV flight
by leveraging the processing
power and I/O capabilities
of Xilinxs Zynq SoC device.
by Zongbo Wang
Founder
Aerotenna
zongbo@aerotenna.com
Bruno Camps
Founder
Aerotenna
Yan Li
Embedded Design Engineer
Aerotenna
Joshua Fraser
Embedded Design Engineer
Aerotenna
Lianying Ji
CTO
Muniu Technology
Jie Zhang
Software Engineer
Muniu Technology
T
he unmanned aerial vehicle (UAV) and
drone industry is quickly growing and reach-
ing new commercial and consumer markets.
The horizon of what is possible with UAVs
continues to push forward into creative new
applications like 3D modeling, military aid
and delivery services.
The problem is that the applications are becoming increas-
ingly more complex, requiring more and more processing pow-
er and I/O (input/output) interfaces, while the UAV platforms
available are not improving at the same pace. The limits on the
capabilities of most UAV platforms are being reached as the
software and hardware needed for flight continue to advance.
Our team at Aerotenna has successfully flown a UAV with
a board we built based on the Xilinx Zynq-7000 All Program-
mable SoC. This flight marks the beginning of our plans to re-
lease microwave sensing products that are computationally
heavy. We turned to the Zynq SoC because the needed process-
ing power was unavailable with other solutions of its class.
With this new platform (Figure 1), we plan to improve the un-
manned flying experience by deploying our microwave-based
collision avoidance systems.
most popular demands such as a camera our powerful platform. The dual-core that places some of the timing-critical
or GPS for navigation. A single platform ARM Cortex-A9 APU within the processing tasks in the programmable
that is compatible with a very wide range Zynq SoC chip allows for unparalleled logic. The I/O peripherals and memory
of sensors and external interfaces cur- processor speed. Nothing on the market interfaces are more versatile than the
rently does not exist on the market. for affordable UAV platform solutions ones provided by MCU-based platforms.
Here at Aerotenna, we believed the compares with the Zynq SoC in chip Another reason we chose the Zynq
way to overcome these limitations structure, multiprocessor capabilities SoC is because it is able to easily han-
was to create a new board design and I/O access speeds. Thus, the Zynq dle the complexities of flight control
from scratch. We have been working SoC is the perfect candidate for the programs, which can be enormous and
to perfect a new UAV platform that next-generation platform. require very fast CPUs. And the device
will excel in the areas where the other Most of the flight control platforms has plenty of power left over, which
platforms fail. We used the Zynq SoC currently in the market are based on a leaves a lot of room for expansion in the
device provided by Xilinx to achieve microcontroller unit (MCU). That archi- flight control programs.
this goal. Its superior design will offer tecture limits the potential for sensor There are many types of flight soft-
the greatly increased processor speed fusion due to the limited processing ware programs, and they all differ in be-
and I/O capabilities needed for the power and I/O extension capabilities. havior and complexity. The flight con-
next generation of UAVs. The advantage of the Zynq SoC is trol software we have decided to use
clear in both processing power and is called ArduPilot, provided by APM
WHY THE ZYNQ SOC? I/O capability: the combination of dual (autopilot machine) on Dronecode. It
We chose the All Programmable Zynq ARM cores plus FPGA logic enables a is more complex than most, but pro-
SoC as the foundation on which to build hardware/software co-design approach vides a lot of functionality not included
Figure 1 Conceptual drone that Aerotenna has been developing has advanced sensing
technology and a powerful processing platform powered by the Zynq SoC.
Figure 2 The OcPoC system will be the first commercially available flight control platform powered by the Zynq SoC chip.
OcPoC versatile
I/Os
PW
s
al
gn
M
si
si
M
gn
PW
al
A/D Converter
s
Analog input
Comm
I/Os
GPS Zynq SoC DDRs
Receiver
GPS
Inertial Measurement
-2
SI
Unit Sensors
C
SD
rd
AG
ca
JT
USB Ethernet
T
he lifetime of any elec- view of some concepts of temperature,
tronic device depends including junction temperature, ther-
on its operating tempera- mal resistance and other phenomena.
ture. At elevated tem- We will look into the causes of tempera-
peratures, devices age ture increase in a device and will list
faster and their lifetime our solutions to prevent it. We will also
is reduced. Yet some applications require address the possible hot-spot issues
the electronics to operate beyond the and propose solutions to them.
specified maximum operating junction In this particular project, the use of
temperature of the device. An example thermoelectric cooling was limited and
from the oil-and-gas industry illustrates we had to find other solutions.
the problem and ways to solve it.
A customer asked our team at Aphesa TEMPERATURE VARIATION
to design a high-temperature camera that Electronic devices are usually specified
will operate inside oil wells (Figure 1). with their maximum junction tempera-
The device required a rather large FPGA ture. Unfortunately, what the system
and had temperature requirements up designers are concerned with is the
to at least 125 degrees Celsiusthe sys- ambient temperature. The difference
tems operating temperature. As a con- between the ambient and the junction
sultancy that develops custom cameras temperatures will depend on the capa-
and custom electronics including FPGA bility of the package to transfer heat
code and embedded software, we have and the cooling system to dissipate this
experience with high-temperature oper- heat outside of the systems enclosure.
ation. But for this project, we had to go Thermal resistance is a heat property
the extra mile. and a measurement of how much a giv-
The product is a down-hole, dual, en material resists heat flow. Because of
color camera designed for use in oil thermal resistance, temperatures inside
well inspection (Figure 2). It performs and outside of a component through
embedded image processing, color re- which heat flows will differ, in just the
construction and communication. The same way that the voltage on each side
system has memory, LED drivers and of a resistor is not the same because of
a high-dynamic-range (HDR) imaging current flow. For a temperature differ-
capability. For this project we chose to ence of 20 degrees between the inside
use the XA6SLX45 device from Xilinx and the outside of a body, a device
(Spartan-6 LX45 automotive) because with 125 degrees of maximum junction
of its wide temperature range, robust- temperature can operate in an environ-
ness, small package, large embedded ment of up to 105 degrees. The thermal
memory and large cell count. resistance is expressed in degrees per
This project was quite challenging watt; for 1 W of heat to dissipate, the
also lots of fun to do. Lets examine how temperature difference between the in-
we put it all together, starting with a re- side and the outside will be equal to the
thermal resistance. This relationship is device. The capability of air to absorb In our case, airflow was not a solu-
formalized in the Figure 3 equation. heat depends on the temperature differ- tion because the quantity of air in the
The energy dissipated as heat de- ence between the air and the device, as enclosure is limited and the air quick-
pends on the device, the circuit, the well as on the pressure of the air. Other ly equalizes in temperature. Water
clock frequency and the code running in solutions include liquid cooling, where cooling was not possible either, be-
the device. The temperature difference the liquid, usually water, replaces air for cause of the long distance between a
between the inside of the device (the more efficiency. The capability of a mass water source and the tool. So for us,
junction temperature) and its environ- of air or fluid to absorb heat is given by the Peltier effect was the only cool-
ment (the ambient temperature) there- the heat absorption equation in Figure 4. ing option. Since the ambient tem-
fore depends on the device, the code The final approach that designers perature is fixed (we cant heat up
and the schematic. often use is thermoelectric cooling, the large fluid quantity as in Figure
where the Peltier effecta temperature 3s formula for a very large value of
USUAL COOLING SOLUTIONS difference created by applying a voltage mass), the thermoelectric cooler will
In most designs where cooling is re- between two electrodes connected to a actually reduce the temperature of
quired, designers use either passive sample of semiconductor materialis the electronics. Unfortunately, as a
cooling (a heat sink that helps dissipate used to cool one side of a cooling plate high current is required for the cool-
heat into the air by increasing the sur- while heating the other. Although this ing devices and a very long conductor
face area in contact with air) or active phenomenon helps to move heat away is used to connect from the surface
cooling. Active-cooling solutions typical- from the device to be cooled, Peltier to the tool, only limited current is ac-
ly force an airflow in order to help renew cooling has one big disadvantage: It re- tually available for cooling and only
the cold air that absorbs heat above the quires significant external power. a small temperature drop is reached.
Wellbore camera
Front field of view
of camera and light
Drilled well
Figure 1 The design of a high-temperature camera to operate inside oil wells (as shown at left) required extending the
operating temperature beyond the qualified maximum temperature of the device. A close-up of the camera is at right.
Figure 2 The high-temperature camera and high-temperature processing board are equipped with Xilinx Spartan-6 FPGAs.
Moreover, our device is a camera charge and discharge route capacitance that area will heat up more than in the
and image quality decreases exponen- when a gate toggles; it is proportional other places, causing a hot spot.
tially with temperature. Therefore, we to the clock rate and the total capaci- As the devices junction temperature
had to optimize our cooling strategies tance. Static power consumption is a is limited, it actually means that the
to cool down as much as possible the function of the devices type, core volt- temperature of the hottest spot should
image sensors and not the other devic- age and technology. The power can be not exceed the maximum junction tem-
es such as FPGAs, memories, LED driv- consumed by the core or by the I/Os. perature. Knowing the power consump-
ers or power supply circuits. When heat is generated at a point in tion of a device and the temperature of
Since cooling of the FPGA was space, it will spread around and cause the package, all we can estimate is the
almost impossible due to the limit- the surrounding area to heat up. If the average temperature of the junction.
ed choice of Peltier cooling on the surrounding area is not a heat source, The last source of heat is related to
image sensors only, our only option then heat will diffuse and the tempera- the current flow in conductors resulting
was to reduce the peak temperature ture rise will be limited. The tempera- in Joule effect, .
inside the FPGA. ture will eventually become uniform
in the whole device if one waits long WHAT HAPPENS IF THE MAXIMUM
CAUSE OF HOT SPOTS enough. If the surrounding area is made TEMPERATURE IS EXCEEDED?
AND RISING TEMPERATURES of other heat sources, then there will be As operating temperatures rise, the life-
There are three sources of power con- a net increase in temperature as each time of a device decreases and the part
sumption in a digital device: dynamic, source will bring heat to the other. will age faster. Some aging processes,
static and Joule effect. Dynamic power If many heat sources are grouped in such as electromigration and corrosion,
consumption is the power required to a small area, then the temperature of only take place at higher temperatures.
Electromigration occurs in the
presence of humidity and an electric
field when atoms of a conductor move
Tj = Ta +(Rth ,package+Rth ,ambient). P in ionic form from their original loca-
tion to reposition themselves some-
where else, leaving a gap behind. The
gap will reduce the effective width of
Figure 3 Relationship between ambient and junction temperatures where Tj is the junction the conductor at that location and will
temperature, Ta is the ambient temperature, Rth , package is the thermal resistance between the
junction and the outside surface of the package, Rth , ambient is the thermal resistance between
cause an increase of the local electric
the outside surface of the package and the ambient air (it is zero if there is no heat sink field, hence inducing more electromi-
or air flow) and P is the power dissipated by the device. gration. This chain reaction will result
Enclustra
Everything FPGA.
1. MARS ZX2
Zynq-7020 SoC Module
Xilinx Zynq-7010/7020 SoC FPGA
Up to 1 GB DDR3L SDRAM
64 MB quad SPI flash
USB 2.0
Gigabit Ethernet
Up to 85,120 LUT4-eq
108 user I/Os
3.3 V single supply
67.6 30 mm SO-DIMM
from $127
2. MERCURY ZX5
Zynq-7015/30 SoC Module
Xilinx Zynq-7015/30 SoC
1 GB DDR3L SDRAM
64 MB quad SPI flash
PCIe 2.0 4 endpoint
4 6.25/6.6 Gbps MGT
USB 2.0 Device
Gigabit Ethernet
Up to 125,000 LUT4-eq
178 user I/Os
5-15 V single supply
56 54 mm
3. MERCURY ZX1
Zynq-7030/35/45 SoC Module
Xilinx Zynq-7030/35/45 SoC
Figure 5 To avoid using all the I/Os and logic cells (top), the design uses two Spartan-6 FPGAs 1 GB DDR3L SDRAM
rather than one. This means the heat is dissipated at two separate locations. 64 MB quad SPI flash
PCIe 2.0 8 endpoint1
8 6.6/10.3125 Gbps MGT2
USB 2.0 Device
Gigabit & Dual Fast Ethernet
Another important step was to opti- es in the design in order to recover, Up to 350,000 LUT4-eq
174 user I/Os
mize our code to reduce the clock rate. or at least detect, bit errors in memo- 5-15 V single supply
64 54 mm
Reducing the clock rate reduces the ry cells or communication. State ma- 1, 2: Zynq-7030 has 4 MGTs/PCIe lanes.
power consumption but also allows the chines also recover if they end up in
device to run at a higher temperature. an unused state. 4. FPGA MANAGER
IP Solution
As an example, we evaluated the trade- We found that using the Xilinx Power
off between slow parallel design and Estimator (XPE) was a good first step in
C/C++
fast pipelined design. executing our design. The Vivado De- C#/.NET
To improve the design, we ensured sign Suite has power estimation tools MATLAB
LabVIEW
the components were dried before their for designs on newer devices. But the
final assembly and applied a protective measurement of power consumption USB 3.0
coating to repel moisture. on the real devices and comparisons be- PCIe Gen2
Gigabit Ethernet
At elevated temperatures, devices tween versions of the code proved to be
will age faster. A product qualification the best and most accurate practice.
can be used to measure the actual life-
Streaming, FPGA
time of the device vs. temperature. NO THERMOELECTRIC COOLING
We also employed a burn-in process Combining all of the above techniques,
made simple.
in production to pre-age the device and we came away with a camera that is One tool for all FPGA communications.
Stream data from FPGA to host over USB 3.0,
reject the parts that seemed to age fast- able to work at 125 degrees of ambi- PCIe, or Gigabit Ethernet all with one
simple API.
er than others (early fails), therefore ent temperature with SDRAM man-
only keeping the best. agement, communication buses and
Also important to our design pro- image processing, even though it is by Design Center FPGA Modules
cess was the use of cyclic redundan- specification limited to 125 degrees of Base Boards IP Cores
cy checking (CRC) and other types junction temperature. Moreover, we We speak FPGA.
of error detection and correction. We managed to reach 125 degrees without
used these techniques in various plac- thermoelectric cooling.
Mitchell Manar
Student
Washington University in St. Louis
Jeremy Tang
Student
Washington University in St. Louis
Xilinx FPGA
CLK CLK
LVCMOS33
(sigout) R
Figure 1 A one-channel delta modulator ADC uses one external resistor and one external capacitor.
Figure 2 This oscilloscope plot shows the 3.75-MHz input signal produced by the Agilent 33250A function generator
in yellow (channel 1) and the feedback signal in blue (channel 2) for the case where F = 240 MHz, R = 2K and C = 47 pF.
The Fourier transform of the input signal as computed by the Tektronix DPO 3054 oscilloscope is shown in red (channel M).
R/C networks and input connectors, 240-MHz sampling frequency are clear- the free, online TFilter FIR filter de-
allowing the prototype system to digi- ly visible, along with a peak at 3.75 sign tool [5]. This filter had 36 dB or
tize up to eight signals simultaneously. MHz associated with the input signal. more of attenuation outside the 2.5- to
Each channel was parallel-terminated 5-MHz passband and 0.58 dB of ripple
with 50 ohms to ground to properly ter- LARGE NUMBER OF TAPS between 3 and 4.5 MHz.
minate the coaxial cable from the signal By applying a bandpass finite impulse The ADC output signal shown in Fig-
generator. It is important to note that to response (FIR) filter to the bitstream, ure 4 has a resolution of approximate-
achieve this performance, we set the it is possible to produce an N-bit bi- ly 5 bits. This is ultimately a function
drive strength of the LVCMOS33 buf- nary representation of the analog in- of the oversampling rate, and you can
fers to 24 mA and the slew rate to FAST, put signal: the ADC output. Since the achieve higher resolution with designs
as documented in the example VHDL digital bitstream is at a much higher optimized for lower input frequencies.
source in Figure 5. frequency than the analog input sig- The ADC output signal shown in
The custom prototype board nal, however, you need to use FIR fil- Figure 4 is also severely oversampled
also supported the use of an FTDI ters with a large number of taps. The at 240 MHz and can be decimated to
FT2232H USB 2.0 Mini-Module [4] that data being filtered, however, only has reduce the ADC output bandwidth.
we used to transfer packetized serial values of zero (0) and one (1), so mul- In a hardware implementation of the
bitstreams to a host PC for analysis. tipliers are not needed (only adders to bandpass filter and decimation blocks,
Figure 3 shows the magnitude of the add the FIR filter coefficients). it would be possible to only compute
Fourier transform of the bitstream the The ADC output shown in Figure every 16th filter output value when dec-
prototype board produced when fed 4 was produced on the host PC using imating by a factor of 16 down to an
the analog signal of Figure 2. Peaks an 801-tap bandpass filter centered effective sample rate of 15 MHz (three
associated with subharmonics of the at 3.75 MHz that we designed using times faster than the highest frequency
First Quarter 2016 Xcell Journal 49
XPLANANTION: FPGA 101
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100 110 120
-10
F (MHz)
-20
Figure 3 This graph shows the Fourier transform of the bitstream produced by the configuration associated with Figure 2.
0.4
0.2
0
56000 56500 57000 57500 58000 58500
-0.2
-0.4
-0.6
Figure 4 The ADC output was produced using an 801-tap bandpass FIR filter centered at 3.75 MHz.
in the band-limited input signal), reduc- tiated directly and connected to the Xilinx ISE Webpack tools to imple-
ing the hardware requirements. analog input and feedback signals, ment the project [6].
Figure 5 shows the VHDL source sigin_p and sigin_n, respectively. Figure 5 shows the VHDL code and
used with the Digilent Cmod S6 devel- The internal signal sig is driven by the the portion of the UCF file associated
opment module to produce the feed- output of the LVDS_33 buffer and sam- with the circuitry of Figure 1.
back signal shown in Figure 2, along pled by the implied flip-flop to produce
with the bitstream data associated sigout. The signal sigout is the serial LOW COMPONENT COUNT
with the Fourier transform of Figure bitstream that is filtered to produce The ADC architecture we have de-
3. An LVDS_33 input buffer is instan- the N-bit ADC output. We used the free scribed has been inaccurately referred
by Jeremy Banks
Product Manager
Curtiss-Wright
Jeremy.banks@curtisswright.com
Jim Everett
Product Manager
Xilinx, Inc.
jeverett@xilinx.com
A
A new mezzanine card standard called
FMC+, an important development for
embedded computing designs using
FPGAs and high-speed I/O, will extend
the total number of gigabit transceiv-
ers (GTs) in a card from 10 to 32 and
increase the maximum data rate from
10 to 28 Gbits per second while main-
taining backward compatibility with
the current FMC standard.
These capabilities mesh nicely with
new devices such as those using the
JESD204B serial interface standard,
as well as 10G and 40G fiber optics
and high-speed serial memory. FMC+
addresses the most challenging I/O
requirements, offering developers the
best of two worlds: the flexibility of a
mezzanine card with the I/O density of
a monolithic design.
The FMC+ specification has been
developed and refined over the last
year. The VITA 57.4 working group has
approved the spec and will present it
for ANSI balloting in early 2016. Lets
take a closer look at this important new
standard to see its implications for ad-
vanced embedded design.
THE MEZZANINE
CARD ADVANTAGE
Mezzanine cards are an effective and
widely used way to add specialized
functions to an embedded system. Be-
cause they attach to a base or carrier
card, rather than plugging directly into
First Quarter 2016 Xcell Journal 53
TOOLS OF XCELLENCE
a backplane, mezzanine cards can be have more I/O capacity than their XMC control interfaces, memory or additional
easily changed. For system designers, counterparts. As with the PMC and XMC processing. But analog I/O is the most
this means both configuration flexibili- specification, FMC and FMC+ define op- common use for FMC technology. The
ty and an easier path to technology up- tions for both air and conduction cool- FMC specification affords a great deal
grades. However, this flexibility usually ing, thereby serving both benign and of scope for fast, high-resolution I/O, but
comes at the cost of functionality due to rugged applications in commercial and there are still trade-offsespecially with
either connectivity issues or the extra defense markets. high-speed parts using parallel interfaces.
real estate needed to fit on the board. The anatomy of the FMC specification For example, Texas Instruments
For FPGAs, the primary open stan- is simple. The standard allows for up to ADC12D2000RF dual-channel, 2-Gsample/
dard is ANSI/VITA 57.1, otherwise known 160 single-ended or 80 differential paral- second (Gsps) 12-bit ADCs use a 1:4 mul-
as the FPGA Mezzanine Card (FMC) lel I/O signals for high-pin-count (HPC) tiplexed bus interface, so the bus speed is
specification. A new version dubbed designs or half that number for low-pin- not too fast for the host FPGA. The digi-
FMC+ (or, more formally, VITA 57.4) ex- count (LPC) variants. Up to 10 full-du- tal data interface alone requires 96 sig-
tends the capabilities of the current FMC plex GT connections are specified. The nals (48 LVDS pairs). For a device of this
standard with a major enhancement to GTs are useful for fiber optics or other class, FMC can support only one of these
gigabit serial interface functionality. serial interfaces. In addition, the FMC ADCs, even if there is sufficient space for
FMC+ addresses many of the draw- specification defines key clock signals. more, because it is limited to 160 signals.
backs of mezzanine-based I/O, com- All of this I/O is optional, though most Lower-resolution devices, even at higher
pared with monolithic solutions, simul- hosts now support the full connectivity. speeds, such as those with 8-bit data paths,
taneously delivering both flexibility The FMC specification also defines may allow more channels even with the in-
and performance. At the same time, the a mix of power inputs, though the pri- creased requirements of the front-end an-
FMC+ standard stays true to the FMC mary power supply, defined by the mez- alog coupling of the baluns or amplifiers,
history and its installed base by support- zanine, is supplied by the host. This ap- clocking and the like.
ing backward compatibility. proach works by partially powering up The FMC specification starts to run out
The FMC standard defines a small-for- the mezzanine such that the host can of steam with analog interfaces delivering
mat mezzanine card, similar in width interrogate the FMC, which responds more than 8 bits of resolution at around 5
and height to the long-established XMCs by defining a voltage range for the VADJ. or 6 Gsps (throughputs of > 50 Gbps) us-
or PMCs, but about half the length. This Assuming the host can provide this ing parallel interfaces. From a market per-
means FMCs have less component real range, then all should be well. Not hav- spective, leading FMCs based on channel
estate than open-standard formats. ing the primary regulation on the mezza- density, speed and resolution are in the 25-
However, FMCs do not need bus inter- nine saves space and reduces mezzanine to 50-Gbps throughput range. This func-
faces, such as PCI-X, which often take power dissipation. tionality results from a trade-off between
a considerable amount of board real physical package sizes and available con-
estate. Instead, FMCs have direct I/O to FMCS FOR ANALOG I/O nectivity to the host FPGA.
the host FPGA, with simplified power Designers can use FMCs for any func- In addition to the parallel connections,
supply requirements. This means that tion that you might want to connect to the FMC specification supports up to 10
despite their size, FMCs could actually an FPGA, such as digital I/O, fiber optics, full-duplex high-speed serial (GT) links.
Clocks 4 4 4
Maximum # GTs 10 24 32
GT clocks 2 6 8
Miscellaneous JTAG, SYNC, power JTAG, SYNC, power JTAG, SYNC, power
good, geographic good, geographic good, geographic
address address address
Power supplies VADJ* (4 pins), 3V3 VADJj* (4 pins), 3V3 VADJ (4 pins), 3V3
(4 pins), 12V (2 pins), (8 pins), 12V (4 pins), (8 pins), 12V (4 pins),
3V3 Aux (1 pin) 3V3 Aux (1 pin) 3V3 Aux (1 pin)
* VADJ: mezzanine defined for voltage level, but provided by host
These interfaces are useful for such er. This dramatic reduction in package due to longer data pipelines. For DRFM
functionality as fiber-optic I/O, Ether- size marries well with evolving FPGAs, applications, latency for data-in to da-
net, emerging technologies like Hybrid which are providing ever more GT links ta-out is a fundamental performance
Memory Cube (HMC) and the Bandwidth at higher and higher speeds. Figure 1 parameter. Although latency is likely to
Engine, and newer-generation analog I/O illustrates an example of package size vary widely between serially connect-
devices that use the JESD204B interface. and FMC/FMC+ board size. ed devices, new generations of devices
Typical high-speed ADCs and DACs will push data through the pipelines
ENTER JESD204B using the JESD204B interface have be- faster and faster, with some promising
Although the JESD204 serial-inter- tween one and eight GT links operating the ability to tune the depth of the pipe-
face standard, currently at revision at 3 to 12 Gbps each, depending on the line. It remains to be seen how much of
B, has been around for a while, only data throughput required based on sam- an improvement is to be realized.
recently has it has gained wider mar- ple rate, resolution and number of ana- Some standard ADC devices sam-
ket penetration and become the serial log I/O channels. pling at >1 Gsps today have latency
interface of choice for newer gener- The FMC specification defines a rel- below 100 nanoseconds. Other appli-
ations of high-sampling data convert- atively small mezzanine card, but with cations can tolerate this latency, or
ers. This wide adoption has been the emergence of JESD204B devices do not care about it, including soft-
stoked by the telecommunications there is room to fit more parts onto the ware-defined radio (SDR), radar warn-
industrys thirst for ever-smaller, low- available real estate. The maximum of ing receivers and other SIGINT seg-
er-power and lower-cost devices. 10 GT links defined by the FMC spec- ments. These applications gain large
As mentioned earlier, a dual-channel ification is a useful quantity; even this advantages by using a new generation
2-Gsps, 12-bit ADC with a parallel inter- limited number of GT links provide 80 of RF ADCs and DACs, a technology
face requires a large number of I/O sig- Gbps or more of throughput while us- driven by the mass-market telecommu-
nals. This requirement directly impacts ing a fraction of the pins otherwise re- nications infrastructure.
the package size, in this case mandating quired for parallel I/O. Outside of the FPGA community,
a 292-pin package measuring roughly 27 The emergence of serially connect- newer DSP devices are also starting to
x 27 mm (though newer-generation pin ed I/O devices, not just those using adopt JESD204B. However, FPGAs are
geometry could shrink the package size JESD204B, does have drawbacks for likely to remain the stronghold in tak-
to something less than 20 x 20 mm). some application segments in electron- ing full advantage of the capabilities
A JESD204B-connected equivalent ic warfare, such as digital radio frequen- of wideband analog I/O devices. Thats
device can be provided in a 68-pin, 10 cy memory (DRFM). Serial interfaces because FPGAs can deal with vast data
x 10-mm packagewith reduced pow- invariably introduce additional latency volumes with better parallelization.
First Quarter 2016 Xcell Journal 55
TOOLS OF XCELLENCE
Connector-limited
8
6 FM
Channels
C
Pa
FM ra
4 C lle
Pa lw
ra ith
lle HS
2 lI
O S
2 4 6 8 10 12 14 16 18 20 22 24
Sample rate per channel (GSPS)
THE EVOLUTION OF FMC+ solutions supporting the different capa- military community can take advantage
To move FMC to the next level, the bilities of FMC and FMC+. of them to meet SWAP-C requirements.
VITA 57.4 working group has created Designers can use the increased
a specification with an increased num- throughput enabled by FMC+ to take OTHER ADVANTAGES
ber of GT links operating at increased advantage of new devices that offer AND USES OF FMC+
speed. FMC+ maintains full FMC back- huge I/O bandwidth. There will still Although FMC+, like FMC, is likely to
ward compatibility by adding to the be trade-offs, such as how many de- be dominated by ADC, DAC and trans-
FMC connectors outer columns for the vices can fit on the mezzanines real ceiver products, the increased GT
additional signals and not changing any estate budget, but for a moderate density provided by FPGAs makes it
of the board profiles or mechanics. number of channels the realizable useful for other functions. Two func-
The additional rows will be part of throughput is a huge leap over to- tions of note are fiber optics and new
an enhanced connector that will min- days FMC specification. serial memories.
imize any impact on available real es- As with JESD204B, there are require-
tate. The FMC+ specification increases NEXT-GENERATION ments for faster, denser fiber optics.
the maximum number of available GT ADCS AND DACS Those based on fiber-optic ribbon ca-
links from 10 to 24, with the option of In the next few years, it is reasonable to bles offer the smallest parts. Because
adding another eight links, for a total expect high-resolution ADCs and DACs the FMC+ footprint readily supports 24
of 32 full duplex. The additional links to break through the 10-Gsps barrier to full-duplex fiber-optic links, this appli-
use a separate connector, referred to support very wideband communications cation is likely where the higher speeds
as an HSPCe (HSPC being the main with direct RF samplings for L-, S- and supported by FMC+ will first be realized.
connector). Table 1 summarizes FMC even C-band frequencies. Below 10 Gsps, Bandwidths of 28 Gbps per fiber will take
and FMC+ connectivity. converters are emerging with 12-, 14- and the throughputs quickly past 100G and
Multiple independent signal integrity even 16-bit resolutions, with some sup- 400G speeds on a single mezzanine. Opti-
teams characterized and validated the porting multiple channels. The majority cal throughput of 100G is emerging today
higher 28-Gbps data rate. The maximum of these devices will be using JESD204B on the current FMC format.
full-duplex throughput can now exceed (or a newer revision) signaling with 12- Another emerging area suitable for
900 Gbps in each direction, when the Gbps channels until newer generations FMC+ is serial memory such as Hy-
parallel interface is included. See Fig- inevitably boost this speed even further. brid Memory Cube and MoSys Band-
ure 2 for an outline of the net through- These fast-moving advances are fueled by width Engine. These novel devices
puts that can be expected for digitizer the telecommunications industry, but the represent an entirely new category of
56 Xcell Journal First Quarter 2016
TOOLS OF XCELLENCE
Support for
Xilinx Zynq Ultrascale+
and Zynq-7000 families
VIVADO DESIGN SUITE The Vivado Design Suite HLx Editions LEARN MORE
HLx EDITIONS are available from the Xilinx Download See the Vivado Design Suite 2015.4
Center at www.xilinx.com/download. Release Notes for more information.
In order to enable ultrahigh produc-
tivity for designing All Programmable Public Access Support for QuickTake Video Tutorials
SoCs and FPGAs, Xilinx has launched 16nm UltraScale+ Family Vivado Design Suite QuickTake vid-
the Vivado Design Suite HLx Editions After announcing the first customer eo tutorials are how-to videos that
with the Vivado Design Suite 2015.4 re- shipment of the Zynq UltraScale+ take a look inside the features of the
lease. The HLx Editions include Vivado MPSoC in September 2015, Xilinx has Vivado Design Suite and UltraFast
High-Level Synthesis (HLS), complete announced the industrys first public- Design Methodology.
with C/C++ libraries; Vivado IP Integra- ly available tools for FinFET-based
tor (IPI); LogicCORE IP subsystems; See all Quick Take Videos here www.
programmable devices, the Ultra-
and the full Vivado implementation xilinx.com/training/vivado.
Scale+ family, in the Vivado Design
tool suite. When coupled with the new Suite 2015.4 release.
UltraFast High-Level Productivity Training
Design Methodology Guide, the tools These devices are supported in the For instructor-led training on the
enable users to realize a 10x to 15x Vivado Design Suite HLx Editions, in Vivado Design Suite, UltraFast
productivity gain over traditional ap- embedded software development tools Design Methodology and more, visit
proaches. This methodology enables: and in the Xilinx Power Estimator. www.xilinx.com/training.
Designers can now validate the Ultra-
Rapid creation of connected plat- Scale+ portfolios 2x to 5x perfor- Download Vivado Design Suite HLx
forms using IPI and connectivity IP mance/watt improvement over 28nm Editions today at http://www.xilinx.
subsystems offerings for their specific designs. com/download.
Rapid development of differenti- The announcement marks the industrys
ated logic using HLS and libraries first publicly available tools for 16nm de-
based on C/C++/OpenCL vices, enabling broad market adoption.
www.techway.eu
T
he Xilinx Alliance Program is a worldwide ecosystem of qualified companies that collaborate with Xilinx
to further the development of All Programmable technologies. Xilinx has built this ecosystem, leveraging
open platforms and standards, to meet customer needs and is committed to its long-term success. Alliance
membersincluding IP providers, EDA vendors, embedded software providers, system integrators and
hardware suppliershelp accelerate your design productivity while minimizing risk. Here are some highlights.
UltraScale VU440 and the prototyping microprocessor core with the con-
PRO-AV BOARD PROVIDES
board has a rated capacity of 30 million siderable processing power and pro-
4K/UHD HDMI TRANSPORT
ASIC gates. This FPGA Prototyping grammable-I/O flexibility of a Xilinx
OVER IP
board offers 10 extension sites with as Virtex-7 FPGA on one ruggedized, air-
The Barco Silex Viper-HV-4K OEM many as 1,327 user I/Os for daughter- cooled, 3U VPX module measuring a
board for 4K HDMI transport over In- boards (e.g. memory boards, interface mere 100 x 160 mm. The module ac-
ternet Protocol, which is built around boards), interconnecting cables or cus- commodates as much as 512 Mbytes
a Zynq-7000 All Programmable SoC, tomer-specific application boards. The of DDR3L-800 SDRAM, 1 Gbyte of
compresses 4K/UHD video using hard- product works in combination with NAND flash memory and 16 Mbytes
ware-based, real-time SMPTE 2042 proFPGAs uno, duo or quad mother- of NOR flash memory connected to
VC-2 LD High Quality (visually lossless) boards, which respectively carry one, the P1010 processor, and as much as
video compression and sends it over two or four members of the growing 4 Gbytes of DDR3L SDRAM and 128
1-Gbps Ethernet. Using it, developers family of proFPGAs FPGA Prototyping Mbtyes of NOR flash memory con-
of professional AV systems can create boards. Thats 120M ASIC gates worth nected to the Virtex-7 FPGA.
point-to-point unicast or multicast dis- of prototyping capacity when you fully
For more information, please visit
tribution networks with low-cost, wide- load a quad motherboard with four Ul-
http://www.xes-inc.com/.
ly available IP networking equipment traScale XCVU440 FPGA Modules.
and standard, low-cost Ethernet cables.
For more information, please visit http://
For more information, please visit www.prodesign-europe.com/profpga/.
CARDSHARP SBC PUTS
http://www.barco-silex.com/.
ZYNQ SOC INTO RUGGED
XMC FORM FACTOR
VPX MODULE MERGES
The Cardsharp single-board comput-
FPGA PROTOTYPING BOARD VIRTEX-7 FPGA, FREESCALE
er (SBC) from Innovative Integration
HAS RATED CAPACITY OF POWER PROCESSOR
packs a Xilinx Zynq-7000 All Program-
30M ASIC GATES
The X-ES (Extreme Engineering Solu- mable SoC (the Z-7045 device) along
As the name suggests, the UltraS- tions) XPedite2470 merges a Freescale with 3 Gbytes of DDR3 SDRAM, 16
cale XCVU440 FPGA Module from P1010 QorIQ processor based on an Mbytes of QSPI flash memory and 32
proFPGA is based on a Xilinx Virtex 800-MHz Power Architecture e500-2 Gbytes of eMMC flash memory into
60 Xcell Journal First Quarter 2016
a 149 x 74-mm XMC form factor for Module offers forty-eight 16.3-Gbps
rugged application. The card, which GTH transceiversalso supplied by Multiple digital bus analyzers (SPI,
has built-in 1-Gbps Ethernet and USB the Xilinx Kintex UltraScale KU115 IC, UART, parallel) are also included,
2.0 ports, runs many of the Zynq SoCs FPGAfor high-speed communica- along with two programmable power
programmable I/O pins including all tion throughput. supplies (0 to +5V, 0 to -5V at 50 to 700
eight of the 12.5-Gbps GTX transceiv- mA, 500 mW max).
For more information, please visit
er ports to an HPC FMC I/O site for
http://www.s2cinc.com/. For more information, please visit
maximum interfacing flexibility.
http://store.digilentinc.com/.
For more information, please visit
http://www.innovative-dsp.com/.
ANALOG DISCOVERY 2 $975 KINTEX ULTRASCALE
PACKS DSO, LOGIC FPGA DEVELOPMENT KIT
DSMART IP CORE LETS ANALYZER, POWER SUPPLY IS A REAL BARGAIN
XILINX DEVICES READ,
Just over a year ago, Digilent rolled Do you want immediate access to
WRITE SMART CARDS
out the extremely versatile Analog the most advanced Xilinx FPGA
Ubiquitous in Europe and now becom- Discovery multifunction instrument, architecture so that you can start
ing a staple in the United States as which is based on a Xilinx Spar- your next system designlike right
well, smart cards based on chip-and- tan -6 FPGA. Now Digilent has come nowbut youre looking for a solu-
PIN cryptosecurity represent a leading out with an encore product, the tion thats easy on your develop-
world standard for secure financial Analog Discovery 2, that implements ment budget? Avnet has a new de-
and related transactions. Digital Core additional features. Among them are velopment kit for you. Its the $975
Designs (DCD) DSMART IP Core a two-channel, 100-Msample/second Xilinx Kintex UltraScale Develop-
compatible with the ISO/IEC 7816 - 3: USB digital oscilloscope with 14 bits ment Kit featuring a 20-nanometer
2006 and EMV 4.1 security standards of resolution and differential inputs; Kintex UltraScale XCKU040-1FB-
makes it possible to implement a con- a two-channel, 100-Msps, 10-MHz ar- VA676 device. Thats a 27 x 27-mm
tact smart-card reader using the pro- bitrary function generator with AM/ device, the smallest Kintex UltraS-
grammable logic in an FPGA. The core FM modulation; and a stereo audio cale device, with 1-mm ball pitch.
has been tested with and implemented amplifier to drive external head-
by a Digital Core Design licensee using phones or speakers. Also onboard For more information, please visit
a Xilinx Artix-7 FPGA. are the following: http://www.avnet.com/.
Xpress Yourself
in Our Caption Contest
ADAM HALE, a computer
engineering student at Texas A&M
University (College Station, Texas),
won a shiny new Digilent Zynq
Zybo board with this caption for
the buggy cartoon in Issue 93
of Xcell Journal:
DANIEL GUIDERA
T
heres a time and a place for selfies, and the workplace may or may Frank now realized why they
not be one of them. You decide as you take a gander at our cartoon asked him if he could troubleshoot
engineer setting himself up for a glamour shot against the backdrop problematic chips on the fly in his
of a server farm. We invite you to Xercise your funny bone and submit an en- recent interview.
gineering- or technology-related caption for this self-centered cartoon. The
image might inspire a caption like En route to finalizing his self-correcting
Congratulations as well to
network design, Sam decided a selfie was in order.
our two runners-up:
Send your entries to xcell@xilinx.com. Include your name, job title,
company affiliation and location, and indicate that you have read the con- Carls old-school manual-
test rules at www.xilinx.com/xcellcontest. After due deliberation, we will debugging skills made quite an
print the submissions we like the best in the next issue of Xcell Journal. impression on the team.
The winner will receive a Digilent Zynq Zybo board, featuring the Xilinx Luis Benites, staff hardware FPGA
Zynq -7000 All Programmable SoC (http://www.xilinx.com/products/ engineer, Spirent (Agoura Hills, Calif.)
boards-and-kits/1-4AZFTE.htm). Two runners-up will gain notoriety and
fame when we print their captions, names and affiliations in the next issue. Joe finally understands all
The contest begins at 12:01 a.m. Pacific Time on March 1, 2016. All entries the buzz around IoT.
must be received by the sponsor by 5 p.m. PT on May 1, 2016.
Daniel C. Kline, CEO,
ParaTechnica (West Bend, Wis.)
NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia, or Canada (excluding Quebec) to enter. Entries must be entirely original. Contest begins on
March 1, 2016. Entries must be received by 5:00 pm Pacific Time (PT) May 1, 2016. Official rules are available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.
Call today :
(800) 493-5551
or e-mail us at
xcelladsales@aol.com
www.xilinx.com/xcell/
n 4 Minutes to Error-Free 100G Ethernet Operation using Xilinx UltraScale+ Integrated 100G IP
n 3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet
n Tiny Zynq Board does absolutely nothing, and thats a good thing explains Servaes Joordens
n How I got lost and beer-buzzed in Prague just so I could see Sphericams Zynq-based, 360-degree, 4K video VR camera