You are on page 1of 6

Implementation Scenario for Teaching

Partial Reconfiguration of FPGA


Pierre Leray, Amor Nafkha, Christophe Moy

SUPELEC/IETR
Avenue de la Boulais, CS 47601, 35576, Cesson-Sévigné Cedex, France
pierre.leray@supelec.fr

Abstract — We present in this paper a lab on partial the one hand, and increase transfer speed to
reconfiguration (PR) of FPGA for a video application. This lab is reconfiguration plan on the other hand. This implies for
dedicated to last year engineering students. The implementation
target is a Xilinx Virtex5 of a ML506 design kit board. The
instance:
structure of the proposed design, as well as the designing steps - to priviledge parameterization techniques at
and the obtained results are detailled. This lab is based on the design time[13],
research done by the authors in the domain of software radio and - to speed-up ICAP interface at the maximum
cognitive radio during last decade. technological capabilities [14].
Index terms— partial reconfiguration of FPGA, Virtex, ICAP,
education Software radio is an application context that does not
differ so much from any other real-time embedded
I. INTRODUCTION electronics domain. This makes partial reconfiguration
of FPGA (combined with a management architecture)
The lab presented in this paper is an heritage of the usefull for many other applications contexts and in
research done by the authors in the domain of software particular image and video processing domain. This has
radio [1] [2] and cognitive radio [3] in their research always been a common interest we also addressed with
work. Future flexible radio operation indeed implies the video processing researchers both for joint radio and
use of heterogeneous processing units such as DSP, video contexts [15][16], and only video processing
FPGA, GPP, and ASIC. But we claimed that efficiency alone. For instance, the TransMedi@ project [17] of the
(in terms of processing power, power consumption, etc.) Brittany Region pole of excellence Images and Networks
can only be guaranteed if a management dedicated to (Images et Réseaux) adressed the FPGA PR solution for
reconfiguration is especially added to radio processing video transcoding in the infrastructure servers.
[4][5][6][7]. Moreover, the requirement for local and fast
reconfiguration was also identified at that time [8]. As We now believe it is time to spread PR technology in the
processor reconfiguration is not a breakthrough, a industrial domain for applications and consequently it is
special focus has been made on hardware side, namely time for education of future engineers.
FPGA, in order to complete the heterogenous
management capabilities: from first experiminents [9], to The paper is organized as follows. Next part describes
realistic radio algorithms implementation [10] and the project we propose as a lab to last year students, just
system integration [11]. In the reconfigurable hardware before they graduate for engineering diploma. Part III
domain, we speak about partial reconfiguration (PR) of exposes how reconfiguration management is deployed in
FPGA [12]. the context of partial reconfiguration of FPGA. A focus
on the design flow for partial reconfiguration is
Efficiency in terms of reconfiguration speed is a crucial summed-up in part IV. Finally, implementation results
feature of partial reconfiguration, and even a condition are given in part V, as well as concluding remarks in a
of pertinence for software radio and cognitive radio last section.
community. That is the reason why we particularly
studied all the possible means to reduce reconfiguration
time through two axes: decrease partial bistream size on
II. PROJECT DESCRIPTION

The student lab consists in implementing a flexible real-


Xilinx ML506 board
time video processing. The video processing is changed Configuration Memory
on-the-fly by dynamically reconfiguring some FPGA
processing area. FPGA Virtex5-
Virtex5-SX50
Reconfiguration Manager
A. Hardware platform architecture
ICAP
MicroBlaze
A transcoding video processing is performed in a FPGA. Controller
The hardware platform is made of a Xilinx ML506
²
design kit, a host PC and a screen for display as shown Video Processing
in Figure 1.
Video_128_in
Reconfigurable Video_256_out
The host PC plays three roles: Processing Unit
- development platform,
- video server,
- highest level reconfiguration manager.
Video DVI
Coder Controller
The screen is directly connected to the kit through a
video connector. A serial link connects the host PC and
the board for reconfiguration management needs in order RGB Video source
to:
- load partial bistreams into configuration memory
at initialization,
- send reconfiguration orders for on-the-fly video
processing adaptation. bitstreams

B. Functional architecture Figure 1 – System architecture

The FPGA processing is made of two distinct pieces. On The goal is to dynamically change the PU operation
the one hand, the video processing chain we’ll detail in without interrupting the video stream.
this paragraph. On the other hand is the management
architecture to be added to the processing in order to C. Video application
enable dynamic and correct reconfiguration, as defined
in our research work on management architectures [7]. The goal is to perfom a video transcoding on a 60 frames
This will be described in the special context of FPGA in per second video stream. This kind of application can be
part III of this paper. met in the data infrastructure context where the video
stream could be compressed in order to fit with
Module “video_128_in” of Figure 1 receives data from bandwidth requirements in a given area. Another need is
the video coder of the design kit and stores it in the also to transcode a given high quality video stream into
embedded memory inside the FPGA (input picture several lower quality streams, as from a HD TV
memory) of 128 by 128 pixels. broadcast stream to a mobile phone format.

Then a processing unit (PU) performs the video We propose here to switch between two different kinds
processing (see next section) and stores the result in the of transcoding. Either enhance the quality of the input
embedded memory inside the FPGA (output picture video stream, or broadcast the input video stream
memory) of 256 by 256 pixels. towards 4 lower quality receivers.

Module “video_256_out” of Figure 1 sends data from In the first case, the algorithm used to change a 128x128
the FPGA to the DVI (Digital Visual Interface) pixels data stream to a 256x256 pixels data stream is the
controller of the design kit. H-264 semi-pixel upscaling of Figure 2.
B. MicroBlaze and its software drivers
a1 = E – 5*F + 20*G +20*H –5*I + J
a = Clip (a1 + 16 >>5) A FPGA configuration is done by loading in the
a
configuration plan binary data, which are called a
E F G H I J
bitstream. Changing the FPGA operation (partially or
totally) means reloading a new bistream. Either it
K b reconfigures the whole FPGA (total bitstream), or only a
j1= a – 5*b + 20*c1 + 20*d1 – 5*e + f sub-part of the FPGA and we speak about partial
j = Clip (j1 + 512 >>10)
c
reconfiguration.
L P R

x j
In this design we chose to use a MicroBlaze softcore to
M d perform the bitstream loading from the memory to the
x1 = E – 5*K + 20*L +20*M –5*N + O ICAP interface, as shown in Figure 3. Bitstreams are
x = Clip (x1 + 16 >> 5)
stored in an off-chip (external to the FPGA) memory of
N e
the design kit board. In this perspective, students had to
develop the soft driver of top right Figure 3 to be
O f executed by the MicroBlaze. This driver’s task consists
in controlling the ICAP interface in order to perform
partial reconfiguration.
Figure 2 – H-264 semi-pixel upscaling schematic view
The MicroBlaze is the bitstream table manager in order
The second context only consists in duplicating 4 times
to select the correct bistream depending on the
the input data stream.
reconfiguration order it receives. This reconfiguration
order is typically coming from higher layers of the
III. PR MANAGEMENT ARCHITECTURE management architecture, as exposed in [7] in a
cognitive radio context.
A. PR design approach
Soft driver
MicroBlaze platform
Two modes are possible for dynamic reconfiguration, Wait for
MicroBlaze Reconfiguration order &
depending on the reconfiguration initiator. Either CPU
Config busy FALSE
reconfiguration is done by an external processor (to the PLB Read Bitstream attributes
FPGA) through JTAG, serial port or SelectMap in Configuration Table

interface. Or the partial reconfiguration is performed by Send Bitstream attributes


to ICAP Controller
a core processor (soft core MicroBlaze for instance) ICAP Controller
embedded in the FPGA to be reconfigured. We speak Length register Hard driver
then about self-reconfiguration. The embedded processor Address register Wait for Length = 0
here reaches configuration plan through ICAP interface Config busy TRUE
Controller 32
in this case, which enables to obtain best reconfiguration ICAP
Primitive Direct Transfert Memory to ICAP
speed as shown in Table 1. 32 400 MB/s
Address ++ ; Length --
Configuration Length == 0
Memory
Configuration Mode Max Clock Rate Data Width Max Bandwidth Config busy FALSE
SelectMap / ICAP 100 MHz 32-bit 3.2 Gbps
Serial Mode 100 MHz 1-bit 100 Mbps
JTAG 66 MHz 1-bit 66 Mbps Figure 3 – Resources supporting reconfiguration
management
Table 1 – Reconfiguration throughput for Virtex5 family [18]

We choose in this student lab to have a self C. ICAP controller


reconfiguring approach for FPGA (also called auto-
reconfiguration sometimes) [5]. The bistream transfer from the off-chip memory to the
ICAP is automatically proposed by Xiling ISE
development tool. This is performed in a software
approach executed by the MicroBlaze (XPS HWICAP
core and API). The performance in terms of data
throughput to the ICAP is far from the theoretical First step consists in describing and synthesizing all the
technological capabilities of 3.2 Gbps (see Table 1). design functional blocks. PR flow imposes a modular-
That is the reason why we propose a DMA (Direct based design approach. It requires to separately
Memory Access) interface between off-chip memory synthesizing each configuration of the reconfigurable
and ICAP primitive, which permits to reach the
PU. The resulting files represent at gate level (netlist)
theoretical transfer bandwidth [14].
top-level design, and all modules present in the global
Another task to be developped by the students is to build architecture.
in VHDL the hard ICAP controller implementing the
state macine down right Figure 3 for that purpose. Second step consists in building the target device
floorplan while specifying the different FPGA areas
D. Loading procedure where the different modules are allocated. The « Set
Reconfigurable » attribute is given to parts where
The lab first implementing step consists for the students
in loading the global bitstream (developped as shown in reconfigurable modules (PU) are. PlanAhead is a
part IV) in order to implement the MicroBlaze, the graphical tooling to help floorplan design. It also handles
interfaces, and the video processing chain with a default final place and route, as well as configuration file
PU. (bitstreams) generation. In PR mode, PlanAhead enables
to manage partial reconfiguration while separating static
Student then develop and execute a C code program to from dynamic areas. Each partial bitstream for each
download partial bitstreams (developped as shown in version of a dynamically reconfigurable area is also
part IV) from the host PC to the off-chip configuration generated by PlanAhead.
memory on the design kit board.
Design Description
The MicroBlaze is ready and then waits for a
reconfiguration order. In this lab, the reconfiguration Top Module Static Module Reconfigurable Module PU
- MicroBlaze platform
orders come from the host. The user sends an ID tag to - ICAP Controller Configuration A Configuration B
- Video Interfaces
select a configuration through an hyperterminal window.
The user plays here the role of the manager of the Synthesis
equipment but we could imagine autonomous decision Netlists
schemes [7]. Floorplanning

Draw Reconfigurable Partition


IV. DESIGN FLOW
Specify any configuration

Partial reconfiguration implementation requires a Floorplan


specific design methodology. Students had to make their Place/Route/Generate
Place/Route/Generate Bitstreams
design in 3 steps: Run implementation of Static and Reconfigurable Modules for each configuration
- description and synthesis of the hardware
platform,
- design implementation and configurations
bitstreams generation, Bitstreams
- develop C code for the MicroBlaze.
Figure 4 – Hardware design flow steps
PR design flow relies on two Xilinx design
environments, as shown in Figure 4 : Xilinx SDK environment is used in the last step to
develop the C code to be executed by the MicroBlaze
- Xilinx ISE Project Navigator and EDK Xilinx
soft core. Students must program the MicroBlaze to
Platform Studio for hierarchical/modular design make it
description and synthesis, - load partial bitstreams from the host to the
- PlanAhead for the floorplan design, configuration memory,
reconfigurable areas definition, until the global - control reconfiguration when an order is
and partial bitstreams generation. transmitted by the management hierarchy (host).
VI. CONCLUSIONS AND FUTURE WORK
V. IMPLEMENTATION RESULTS
This paper shows how a partial reconfiguration design
A. Application and real experimentation is performed in a student lab.
This lab is based on research activities and previous
The design kit board comprises one Xilix Virtex5- research results of the professors in the domain of
SX50T FPGA clocked at 100 MHz, whose main software and cognitive radio. However partial
chararacteristics are: reconfiguration technology may be usefull and used in
- 32640 slices, many other domains requiring both high performance (in
- 132 Blocks RAM of 36kb (4752 kb), terms of processing power) and flexibility. Under the
- 288 DSP blocks, condition of considering reconfiguration management as
- global bitstream size: 2.5 MBytes. important as the processing itslef, partial reconfiguration
of FPGA opens a new era in reconfigurable computing,
Upscaling PU is clocked at 200 MHz for a performance mixing both hardware performance and software
of 200 Mpixels/s. Its complexity is 1241 slices and the flexibility. Students may experiment such a new
corresponding partial bitstream size is 57 kBytes. paradigm with this lab and then disseminate this
technology to the industry after graduating.
B. Reconfiguration
VII. ACKNOWLEDGMENT
Reconfiguration performance is here considered in terms
of reconfiguration speed or reconfiguration time. Authors thank Xilinx for their support for teaching
Depending on each application contraints a good material (software licenses and tutorials), and for the
performance may be achieved at different orders. The early access they afforded to PlanAhead in the past [19].
idea is that reconfiguration overhead must be negligible
compared to PU’s processing duration. In 60 frames per VIII. REFERENCES
second video stream, each picture or frame is displayed
every 17 ms. It has been measured that the [1] Mitola J., “The Software Radio Architecture,” IEEE Comms.
Mag., vol. 33, no. 5, pp. 26-38, May 1995
reconfiguration of the Upscaling PU takes 150 µs. [2] Kountouris A., Moy C., and Rambaud L., "Reconfigurability:
Consequently, we can consider that reconfiguration is A Key Property in Software Radio Systems", First Karlshruhe
not adding some unacceptable overhead compared to Workshop on Software Radios, Germany, 29-30 Mar. 2000
processing load. This illustrates the pertinence of FPGA [3] Mitola J., “Cognitive Radio: An Integrated Agent
Architecture for Software Defined Radio”, Ph.D. dis. Royal Inst.
PR for ultra-fast adaptation in real-time systems. of Tech., Sweden, 2000
[4] Kountouris A. and Moy C., "Reconfiguration in Software
The reconfiguration of 150 µs corresponds to a Radio systems", Karlsruhe Workshop on Software Radio,
reconfiguration throughput of 3,04 Gbps: Germany, Mar. 2002
[5] Delahaye J.-P., Palicot J., Leray P., "A Hierarchical Modeling
3
Approach in Software Defined Radio System Design," SIPS
57.10 × 8 2005, Athens-Greece, Nov. 2005.
R throughput = = 3,04.10 9 bps
150.10 −6 [6] Godard L., Moy C. and Palicot J., "From a Configuration
Management to a Cognitive Radio Management of SDR
Systems", CrownCom'06, 8-10 June 2006, Mykonos, Greece
In fact the maximum technological throughput of [7] Moy C., "High-Level Design Approach for the Specification
3,2 Gbps on Virtex5 devices is reached with an initial of Cognitive Radio Equipments Management APIs", Journal of
overhead of 7,5 µs. The effective download time is Network and System Management, vol. 18, n° 1, pp. 64-96,
consequently of 142,5 µs, for a total reconfiguration time Mar. 2010
[8] Delahaye J.P., Leray P., Moy C. and Palicot J., "Managing
of 150 µs and then: Dynamic Partial Reconfiguration on Heterogeneous SDR
Platforms", SDR Forum Technical Conference’05, Anaheim
57.103 × 8 (USA), November 2005
R throughput = = 3,2.109 bps [9] Delahaye J.P., Gogniat G., Roland C., Bomel P., "Software
142,5.10 −6 Radio and Dynamic Reconfiguration on a DSP/FPGA Platform,"
3rd Karlsruhe Workshop on Software Radios, proc. pp 143-151,
This result has been published in Reconfigurable Karlsruhe Germany, March 17-18 2004.
[10] Delahaye J.P., Palicot J., Moy C., Leray P., “Partial
Computing Conference in 2009 [14]. Reconfiguration of FPGAs for Dynamical Reconfiguration of a
Software Radio Platform”, IST Mobile and Wireless
Communications Summit'07, 1-5 July 2007, Budapest, Hungary
[11] Delorme J., Martin J., Nafkha A., Moy C., Clermidy F.,
Leray P., Palicot J., “A FPGA partial reconfiguration design
approach for cognitive radio based on NoC architecture”, IEEE
New Circuits and Systems Conference, NEWCAS, 22-25 June
2008, Montréal, Canada
[12] "Virtex Series Configuration Architecture User Guide,"
Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124,XAPP151
(v1.6) March 24, 2003
[13] Gul S.T., Alaus L., Noguet D., Moy C. and Palicot J., "The
Common Operator Technique: An Optimization Process to
Identify and Design a Set of Common Operators to Perform
SDR Equipment", ICT Mobile Summit’09, 10-12 June 2009,
Santander, Spain
[14] Delorme J., Nafkha A., Leray P., Moy C., “New
OPBHWICAP interface for real-time Partial reconfiguration of
FPGA”, International Conference on ReConFigurable
Computing and FPGAs, ReConFig'09, Cancun, Mexico, 9-11
Dec 2009
[15] Raulet M., Urban F., Nezan J.F., Moy C., Deforges O.,
Sorel Y., "Rapid Prototyping for Heterogeneous
Multicomponent Systems: an MPEG-4 Stream Over an UMTS
Communication Link", Eurasip Journal on Applied Signal
Processing – special issue on Design Methods for DSP Systems,
Kluwer Academic Publishers ; Volume 2006 (2006), Article ID
64369
[16] Moy C., Raulet M., "High-Level Design for Ultra-Fast
Software Defined Radio Prototyping on Multi-Processors
Heterogeneous Platforms", Journal on Advances in Electronics
and Telecommunications – Radio Communication Series:
special issue on Recent Advances and Future Trends in Wireless
Communications, Vol. 1, n° 1, pp. 67-85, April 2010
[17] http://hpcas.enstb.org/transmedia
[18] Xilinx tutorial presentation – “Introduction to Partial
Reconfiguration Methodology”, 2010
[19] Xilinx, Early access partial reconfiguration user
guide,ug208, 2006.

You might also like