ES-MEL-AEL ZG554 - Lec1

Reconfigurable Computing
AEL ZG 554 / ES ZG 554 / MEL ZG 554

Session 1
Pawan Sharma
BITS Pilani ps@pilani.bits-pilani.ac.in
Pilani Campus 23/07/2022
Today’s Lecture
Course Overview
• Module: Introduction to Reconfigurable
Computing
– General Purpose Computing [T1. Sec 1]
– Domain and Application specific processors [T1. Sec 2
& 3]
– Reconfigurable Computing [T1. Sec 4]
– Fields of Application [T1. Sec 5.1 to 5.4]
2
BITS Pilani, Pilani Campus
Course Overview
Hardwired Technology Software Programmed Processors

• ASIC based or a set of individual • General purpose processors
components forming a board level • Far more flexible than ASIC
solution • Execute a set of instructions to perform
• Designed specifically to perform a computation
given computation • Change instructions to change
• Very fast and efficient when functionality without changing
executing same computation hardware
designed for • Poor performance compared to ASIC
– read each instruction from memory,
• Can not be altered post fabrication decode its meaning, and only then
• Forces redesign and refabricating execute it.
– results in a high execution overhead for
• Very expensive process if already each individual operation
deployed in large no of systems – New operations must be built out of
• Same for board level solutions— existing instructions as ISA determined at
fabrication
cant replace them in events of
change or upgrade in application

New Trend in Computation
• Reconfigurable Computing aims at filling this gap between hardware

and software by blending benefits of both of them,
• Achieves potentially much higher performance than software, while
maintaining a higher level of flexibility than hardware.
• Emerged as an important organizational structure for implementing
computations.
• It combines the post-fabrication programmability or temporal
computational style of software programmed processors with the
spatial computational style most commonly employed in hardware
designs.
• The result changes traditional “hardware” and “software”
boundaries, providing an opportunity for greater computational
capacity and density within a programmable media.

Reconfigurable Devices
• Reconfigurable devices, including field-programmable gate

arrays (FPGAs), contain an array of computational
elements whose functionality is determined through
multiple programmable configuration bits.
• These elements, sometimes known as logic blocks, are
connected using a set of programmable routing resources.
• In this way, custom digital circuits can be mapped to the
reconfigurable hardware by computing the logic functions
of the circuit within the logic blocks, and using the
configurable routing to connect the blocks together to
form the necessary circuit.

y=Ax2 + Bx + C
• a small number of more general compute

resources are reused in time, allowing the
computation to be implemented compactly
• Generalized – can perform many functions
well
• Sequential – inherently constrained even
with multiple data paths
• Fixed logic – data sizes, number of
computational units, etc. cannot be
changed
Spatially Configurable Implementation
• each operator exists at a different point in space, allowing the computation to

exploit parallelism to achieve high throughput and low computational latencies
• Parallelism customized to meet design objectives
• Logic specialization to perform a specific function
• Hardware-level adaptation of functionality to meet changing problem
requirements
With these two options, we implicitly connect spatial processing with hardware computation and temporal processing with software.

Introduction to FPGAs
• Field-Programmable Gate Arrays

– Literally, an array of logic gates that can be programmed with
new functionality in the field.
• Target Applications
– Image/video processing
– Cryptographic ciphers
– Military and aerospace applications
• What are the advantages of FPGA technology?
– Algorithmic agility / upload
– Cost efficiency
– Resource efficiency
– Throughput

• FPGAs can be customised to solve any problem after
device fabrication
• Exploit a large degree of spatially customized
computation in order to perform their computation
• Reconfigurable devices have the obvious benefit of
spatial parallelism, allowing them to perform more
operations per cycle.
• FPGAs contain an array of computational elements
• Functionality is determined through multiple
programmable configuration bits.
8
LUT based logic element
I1 I2 I3 I4
• Each LUT operates on four one-bit inputs
Cout • Output is one data bit
Cout carry
logic
• Can perform any Boolean function of
four inputs
4
• 22 = 65536 functions (4096 patterns)
4-LUT
• The basic logic element can be more
complex (multiplier, ALU, etc.)
DFF
• Contains some sort of programmable
interconnect
OUT

• These programmable elements, known as logic
blocks, are connected using a set of routing
resources that are also programmable.
• Custom digital circuits can be mapped to the
reconfigurable hardware by computing the logic
functions of the circuit within the logic blocks
• Using the configurable routing to connect the blocks
together to form the necessary circuit
• Machines based on these FPGAs have achieved
impressive performance often achieving 100x the
performance of processor alternatives and 10 - 100 x
the performance per unit of silicon area

Classify Reconfigurable Systems
• Current reconfigurable computing systems can be

classified by three main design decisions:
• Granularity of programmable hardware
– Low-level components with traditional ASIC design flow?
– More complex base units like multipliers, ALUs, etc.?
• Proximity of the CPU to the programmable hardware
– On the chip? On the bus? On the board? On the network?
• Capacity
– How many equivalent ASIC gates?
– How to allocate resources? Set ratios of memory to
computation to interconnect?
11
Higher level diagram of FPGA
12
• FPGAs are composed of the following:
– Configurable Logic Blocks (CLBs)
– Programmable interconnect
– Input/Output Buffers (IOBs)
– Other stuff (clock trees, timers, memory, multipliers,
processors, etc.)
• CLBs contain a number of Look-Up Tables (LUTs) and
some sequential storage.
– LUTs are individually configured as logic gates, or can be
combined into n bit wide arithmetic functions.
– Architecture Specific

• Major players in the FPGA industry:
– Chipmakers – device families
• Xilinx – Spartan, Spartan-II, Spartan-3, Virtex, Virtex-II
• Actel – eX, MX, SX, Axcelerator, ProASIC
• Intel – ACEX, FLEX, APEX, Cyclone, Mercury, Stratix
• Atmel – AT6000, AT40K
• Software developers – CAD tools
– Synopsys – FPGA Compiler
– Mentor Graphics – HDL Designer, ModelSim
– Synplicity – Synplify, Synplify Pro
14
Text and Reference Books
T1 Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications. Christophe Bobda, Springer, 2007
T2 Scott Hauck, André DeHon, Reconfigurable Computing - The Theory and Practice of FPGA Based Computation, The Morgan Kaufmann Series in
Systems on Silicon, 2007.
R1 Wolf Wayne, FPGA Based System Design, Pearson Edu, 2004.
R2 Verilog HDL, Samir Palnitkar, Prentice Hall, 2003.
R3 R Vaidyanathan, Trahan Jerry, Dynamic Reconfiguration: Architectures and Algorithms, L, Kluwer Academic, 2003.
R4 Xilinx, Altera and Microsemi Architecture reference manual
R5 Giovanni De Micheli, synthesis and optimization of digital circuits, Tata McGraw-Hill, 2003
R. Druyer, L. Torres, P. Benoit, P. V. Bonzom and P. Le-Quere, "A survey on security features in modern FPGAs," Reconfigurable
Communication-centric Systems-on-Chip (ReCoSoC), 2015 10th International Symposium on, Bremen, 2015, pp. 1-8. doi:
10.1109/ReCoSoC.2015.7238102
R6
https://www.altera.com/en_US/pdfs/literature/wp/wp-01111-anti-tamper.pdf
http://www.xilinx.com/support/documentation/white_papers/wp365_Solving_Security_Concerns.pdf
http://www.microsemi.com/document-portal/doc_view/132850-secure-architecture-in-microsemi-fpgas-and-soc-fpgas-an-overview
15
Evaluation Components
• Two Lab based assignments (EC-1-online) : 10% + 15%

• Mid-semester Examination (EC-2-open book): 30%
• Comprehensive Examination (EC-3 –open book): 45%
16
Lab Details
• Use of Xilinx Vivado software
• Software and hardware to be accessed from central
remote lab facilities, using Internet
• Need high speed internet connection to access, preferably
>4 Mbps
• If available, you may also use your own copy of software
and hardware, e.g., open source tool-ISE webpack from
Xilinx (does not support hardware interface)
• Assignments to be uploaded on course page within
timeline
• Lab reference material would be available on course page

Pre-requisites
• Students should have a basic knowledge of digital logic design,

including basic concepts of logic gates, decoders, multiplexers, flip-
flops, and memory, binary number systems, and simple logic
optimization techniques like K-map algorithm.
• Knowledge of hardware description languages, such as Verilog or
VHDL, is also helpful.
• Have basic knowledge of graph theory, computer programming (not
essential)
• In sum, this course is appropriate for most readers with a
background in electrical engineering, computer science, or
computer engineering.

Computing Paradigms
• The Von Neumann Computer

• Pipelining
• Domain specific processors (DSP)
• Application Specific Integrated Circuits (ASIC)
• Application specific instruction set processors (ASIP)
• Reconfigurable Processors (FPGA)

The Von Neumann Computer
A computer could have a simple structure, capable of executing any

kind of program, given a properly programmed control unit,
without the need of hardware modification
• Simplicity in programming
• Follows sequential way of human thinking
Program execution
• Instruction Fetch (IF): The next instruction to be executed is
fetched from the memory
• Decode (D): The instruction is decoded to determine the
operation
• Read operand (R):The operands are read from the memory
• Execute (EX): The required operation is executed on the ALU
• Write result (W): The result of the operation is written back to
the memory
• Instruction execution in Cycle (IF, D, R, EX,W)
• In each of those five cycles, only the part of the hardware
involved in the computation is activated. The rest remains idle

Advantages:
• Flexibility: any well coded program can be executed.
• Control Unit gets data and instruction in the same way from one memory. It simplifies
design and development of the Control Unit.
• Data from memory and from devices are accessed in the same way.
• Memory organization is in the hands of programmers.
Disadvantages
• Speed efficiency: Not efficient, due to the sequential program execution (temporal
resource sharing).
• Resource efficiency: Only one part of the hardware resources is required for the
execution of an instruction. The rest remains idle.
• One bus is a bottleneck. Only one information can be accessed at the same time.
• Memory access: Memories are about 1000 time slower than the processor
• Instruction stored in the same memory as the data can be accidentally rewritten by an
error in a program.
• Drawbacks are compensated using high clock speed, pipelining, caches, instruction
pre-fetching, etc.

Pipelining
Sequential execution
• tcycle = cycle execution time
• One instruction needs 5 clcok cycles =
5*tcycle
➢
• 3 instructions executed in 15*tcycle

Pipelining
• One instruction needs 5 clock cycles =
5*tcycle
• No improvement in instruction cycle BUT
increase in throughput
• 3 instructions need 7*tcycle in the ideal case
due to overlap of execution of inst
(pipelining)
• 9*tcycle considering pipelining hazards as in
Harvard arch
Even with pipeline and other improvements like cache, the execution remain sequential.—Amdahl’s constraints

Domain specific processors
• Data path is tailored for a class of algorithms
• Overcome the drawback of the von Neumann computer – reduced
memory access
• DSP (Digital Signal Processors)
• To speed-up computation of repetitive, numerically intensive tasks in
signal processing areas such as telecommunication, multimedia,
automobile, image processing, etc
• Signal processing applications are usually multiply accumulate (MAC)
dominated.
• Datapath optimized to execute one or many MACs in only one cycle.
• Instruction fetching and decoding overhead is removed due to
availability of specialized hardware and instructions
• Memory access is limited by directly processing the input dataflow

Processor
Application Specific Instruction Set Processor
• An ASIP is a processor that can be specialized to a particular application domain.

• A compromise between the two extremes ASIC and GPP
• ASIPs and ASICs are designed for a very specific application – But ASIPs are
programmable and therefore almost as flexible as GPP
• Contains static logic for defining minimum ISA and configurable logic to design new
instructions
• ASIPs are great to be integrated in embedded systems or SoC – Design effort is more
on system level rather than on hardware. Goal is to optimize on area and power and
design effort is more on system level compared to hardware
• instruction set of the application is directly implemented in hardware.
• DSPs may also fall under this domain but parallelism is more pronounced in ASIP
• Extending the main processor datapath and act as accelerators for offloading main
processor for performance or power-critical operations
• Act as “activable” accelerators
• Examples: ASIP for Image processing, Network processors, Cryptographic processors,
Cisco Quantum flow processor
ASIP
Implementation of a VN computer ASIP implementation:
if (a < b) then The complete execution is done in
{ parallel in one clock cycle
d = a+b; run-time = tmax= delay longest path from
c = a*b; input to output
}
else
{
d= a+1;
c= b-1
}
At least 3 instructions in prog

run-time >= 3*tinstruction
3×5×tcycle=15 * tcycle
• The VN processor can compete with this ASIP

only if 15 ∗ tcycle < tmax, i.e. tcycle <
tmax/15. The VN must be at least 15 times
faster than the ASIP to be competitive.

Application Specific Integrated Circuit (ASIC)
• Optimize the complete circuit for a given function

• Optimization is done by implementing the inherent parallel
structure on a chip
• The data path is optimized for only one application.
• Instruction fetching and decoding overhead is removed
• Instruction set implemented in hardware
• Memory access is limited by directly processing the input data flow
• Exploitation of parallel computation
• ASIC examples: baseband processor in mobile phones, chipsets in
PCs, MPEG encoder/decoder, DSP functions

When to use RC?
RC devices enable design of digital circuits without fabricating a device

 Therefore, RC can be used anytime a digital circuit is needed
 Examples: ASIC prototyping, ASIC replacement,
replacing/accelerating microprocessors
 But, when should RC be used instead of alternative technologies?
Implementation Possibilities
Microprocessor RC (FPGA,CPLD, etc.) ASIC
Performance
Why not use an ASIC for everything?

When to use RC?
1. When it provides the cheapest solution

• Depends on:
• NRE Cost - Non-recurring engineering cost: Cost involved with
designing application
• Unit cost - cost of a manufacturing/purchasing a single system
• Volume - # of units
• Total cost = NRE + unit cost * volume
• RC is typically more cost effective for low volume applications
• RC: low NRE, high unit cost
• ASIC: very high NRE, low unit cost

2. When time to market is critical
– Huge effect on total revenue
3. When circuit may have to be modified
– Can’t change ASIC - hardware
– Can change circuit implemented in FPGA
Uses
– When standards change
• Codec changes after devices fabricated
– Allows addition of new features to existing devices
– Fault tolerance/recovery
– “Partial reconfiguration” allows virtual device with arbitrary size - analogous to
virtual memory
Without RC
– Anything that may have to be reconfigured is implemented in software
• Performance loss

Reconfigurable Computing
• Reconfigurable computing (RC) is the study of architectures that can
adapt (after fabrication) to a specific application or application domain
– Involves architecture, tools, CAD, design automation, algorithms,
languages, etc.
• Reconfigurable computing can be defined as the study of computations
involving reconfigurable devices.
• Spatial structure of the device is modified such as to use the best
computing approach to speed up that application
• For an application, the device structure will be modified again to
match the new application

• Alternatively, RC is a way of implementing circuits without fabricating a device
• Essentially allows circuits to be implemented as “software”. Circuits are no longer
synonymous with hardware
• RC devices are programmable by downloading bits, just like microprocessors
• Difference is that microprocessor bits specify instructions, whereas RC bits
specify circuit structures
a b
Microprocessor FPGA Binaries
Binaries (Bitfile)
x c
001010010 001010010
Bits loaded into Bits loaded into logic blocks,

switch matrices, memories, etc. y
program memory
0010 0010
… Processor
Processor … FPGA
Processor

Some Fields of Application
 Rapid prototyping
 Post fabrication customization
 Multi-modal computing tasks
 Adaptive computing systems
 Fault tolerance
 High performance parallel computing

Rapid prototyping
• Testing hardware in real conditions
before fabrication
• Software simulation
• Relatively inexpensive
• Slow
APTIX System Explorer
• Accuracy
• Hardware emulation
• Hardware testing under real
operation conditions
• Fast
• Accurate
• Allow several iterations
ITALTEL FLEXBENCH

In-System customization
 Time to market advantage Manufacturer
o Ship the first version of a product
o Remote upgrading with new product
versions
o Remote repairing
• Mars rover vehicle (Mars Pathfinder launched
4th July 1997)

Multi-modal computation
Systems that handle many different
types of inputs. Control units
handles in time multiplexed manner.
• mobile phones..
• Built-in Digital Camera Video service request
phone service
Configuration
• Games,
• Internet Navigation system,
• Emergency Diagnostics
• Different standard protocols
• Monitoring
• Entertainment

Adaptive computing systems
• Computing systems that are able to

adapt their behavior and structure to
changing operating and
environmental conditions, time-
varying optimization objectives, and
physical constraints like changing
protocols, new standards, or
dynamically changing operation
conditions of technical systems.
• Dynamic adaptation to environment
and threats for extended mission
capabilities

Fault Tolerance
• FPGAs are sensitive to SEU (Single event upset) and SET (Single event
transients) since the configuration memory of the chip can be
affected, resulting in permanent error, due to electromagnetic noise
and radiation and particularly in space applications, cosmic rays can
hit silicon-surfaces causing high-density electron-hole pairs which
may lead to transient errors
• Requires duplication or triplication of resources for combinational
logic and parity check for on-chip caches
• Triple Modular Redundancy (TMR) with a voter circuit is common
approach. Three identical hardware modules perform their
operations in parallel and their output is voted.

GPU vs FPGA
• Raw Compute Power: Xilinx research shows that the Tesla P40 (40 INT8
TOP/s) with Ultrascale+TM XCVU13P FPGA (38.3 INT8 TOP/s) has almost
the same compute power, XCVU13P with high amount of on-chip cache
memory reduces the memory bottlenecks, flexibility of FPGAs in
supporting the full range of data types precisions, e.g., INT8, FTP32,
binary and any other custom data type
• Efficiency and power: an image classification project showed that Arria
10 FPGA performs almost 10 times better in power consumption, Xilinx
showed that the Xilinx Virtex Ultrascale+ performs almost four times
better than NVidia Tesla V100 in general purpose compute efficiency.

Analog Domain
• FPAA – Field Programmable Analog Array

• Consists of Configurable Analog Blocks (CABs)
• CABs contain operational amplifiers, programmable capacitor
arrays (PCAs), or programmable resistor arrays (PRAs) for
continuous time circuits or configurable switches for switched
capacitor circuits
• Means they can operate in one of two modes: continuous
time and discrete time.
• Example: AN221E04 device by Anadigm

Some papers on FPGA Applications
• FPGAs in Industrial Control Applications, Eric Monmasson and

etal..
• Recent Trends in FPGA Architectures and Applications, Philip
H.W. Leong
• Why Compete When You Can Work Together: FPGA-ASIC
Integration for Persistent RNNs, Eriko Nurvitadhi and etal…..

ES-MEL-AEL ZG554 - Lec1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ES-MEL-AEL ZG554 - Lec1

Uploaded by

Copyright:

Available Formats

Reconfigurable Computing

AEL ZG 554 / ES ZG 554 / MEL ZG 554

Hardwired Technology Software Programmed Processors

BITS Pilani, Pilani Campus

• Reconfigurable Computing aims at filling this gap between hardware

BITS Pilani, Pilani Campus

• Reconfigurable devices, including field-programmable gate

BITS Pilani, Pilani Campus

• a small number of more general compute

Spatially Configurable Implementation

• each operator exists at a different point in space, allowing the computation to

BITS Pilani, Pilani Campus

• Field-Programmable Gate Arrays

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• Current reconfigurable computing systems can be

BITS Pilani, Pilani Campus

R1 Wolf Wayne, FPGA Based System Design, Pearson Edu, 2004.

R2 Verilog HDL, Samir Palnitkar, Prentice Hall, 2003.

R4 Xilinx, Altera and Microsemi Architecture reference manual

• Two Lab based assignments (EC-1-online) : 10% + 15%

BITS Pilani, Pilani Campus

• Students should have a basic knowledge of digital logic design,

BITS Pilani, Pilani Campus

• The Von Neumann Computer

BITS Pilani, Pilani Campus

A computer could have a simple structure, capable of executing any

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• 3 instructions executed in 15*tcycle

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• An ASIP is a processor that can be specialized to a particular application domain.

At least 3 instructions in prog

• The VN processor can compete with this ASIP

BITS Pilani, Pilani Campus

• Optimize the complete circuit for a given function

BITS Pilani, Pilani Campus

RC devices enable design of digital circuits without fabricating a device

 Examples: ASIC prototyping, ASIC replacement,

Microprocessor RC (FPGA,CPLD, etc.) ASIC

BITS Pilani, Pilani Campus

1. When it provides the cheapest solution

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Bits loaded into Bits loaded into logic blocks,

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• Computing systems that are able to

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• FPAA – Field Programmable Analog Array

BITS Pilani, Pilani Campus

• FPGAs in Industrial Control Applications, Eric Monmasson and

BITS Pilani, Pilani Campus

You might also like