Professional Documents
Culture Documents
ES-MEL-AEL ZG554 - Lec1
ES-MEL-AEL ZG554 - Lec1
Course Overview
• Module: Introduction to Reconfigurable
Computing
– General Purpose Computing [T1. Sec 1]
– Domain and Application specific processors [T1. Sec 2
& 3]
– Reconfigurable Computing [T1. Sec 4]
– Fields of Application [T1. Sec 5.1 to 5.4]
2
BITS Pilani, Pilani Campus
Course Overview
With these two options, we implicitly connect spatial processing with hardware computation and temporal processing with software.
I1 I2 I3 I4
• Each LUT operates on four one-bit inputs
Cout • Output is one data bit
Cout carry
logic
• Can perform any Boolean function of
four inputs
4
• 22 = 65536 functions (4096 patterns)
4-LUT
• The basic logic element can be more
complex (multiplier, ALU, etc.)
DFF
• Contains some sort of programmable
interconnect
OUT
11
BITS Pilani, Pilani Campus
Higher level diagram of FPGA
12
BITS Pilani, Pilani Campus
• FPGAs are composed of the following:
– Configurable Logic Blocks (CLBs)
– Programmable interconnect
– Input/Output Buffers (IOBs)
– Other stuff (clock trees, timers, memory, multipliers,
processors, etc.)
• CLBs contain a number of Look-Up Tables (LUTs) and
some sequential storage.
– LUTs are individually configured as logic gates, or can be
combined into n bit wide arithmetic functions.
– Architecture Specific
14
BITS Pilani, Pilani Campus
Text and Reference Books
T1 Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications. Christophe Bobda, Springer, 2007
T2 Scott Hauck, André DeHon, Reconfigurable Computing - The Theory and Practice of FPGA Based Computation, The Morgan Kaufmann Series in
Systems on Silicon, 2007.
R3 R Vaidyanathan, Trahan Jerry, Dynamic Reconfiguration: Architectures and Algorithms, L, Kluwer Academic, 2003.
R5 Giovanni De Micheli, synthesis and optimization of digital circuits, Tata McGraw-Hill, 2003
R. Druyer, L. Torres, P. Benoit, P. V. Bonzom and P. Le-Quere, "A survey on security features in modern FPGAs," Reconfigurable
Communication-centric Systems-on-Chip (ReCoSoC), 2015 10th International Symposium on, Bremen, 2015, pp. 1-8. doi:
10.1109/ReCoSoC.2015.7238102
R6
https://www.altera.com/en_US/pdfs/literature/wp/wp-01111-anti-tamper.pdf
http://www.xilinx.com/support/documentation/white_papers/wp365_Solving_Security_Concerns.pdf
http://www.microsemi.com/document-portal/doc_view/132850-secure-architecture-in-microsemi-fpgas-and-soc-fpgas-an-overview
15
BITS Pilani, Pilani Campus
Evaluation Components
16
BITS Pilani, Pilani Campus
Lab Details
• Use of Xilinx Vivado software
• Software and hardware to be accessed from central
remote lab facilities, using Internet
• Need high speed internet connection to access, preferably
>4 Mbps
• If available, you may also use your own copy of software
and hardware, e.g., open source tool-ISE webpack from
Xilinx (does not support hardware interface)
• Assignments to be uploaded on course page within
timeline
• Lab reference material would be available on course page
Program execution
• Instruction Fetch (IF): The next instruction to be executed is
fetched from the memory
• Decode (D): The instruction is decoded to determine the
operation
• Read operand (R):The operands are read from the memory
• Execute (EX): The required operation is executed on the ALU
• Write result (W): The result of the operation is written back to
the memory
• Instruction execution in Cycle (IF, D, R, EX,W)
• In each of those five cycles, only the part of the hardware
involved in the computation is activated. The rest remains idle
Disadvantages
• Speed efficiency: Not efficient, due to the sequential program execution (temporal
resource sharing).
• Resource efficiency: Only one part of the hardware resources is required for the
execution of an instruction. The rest remains idle.
• One bus is a bottleneck. Only one information can be accessed at the same time.
• Memory access: Memories are about 1000 time slower than the processor
• Instruction stored in the same memory as the data can be accidentally rewritten by an
error in a program.
• Drawbacks are compensated using high clock speed, pipelining, caches, instruction
pre-fetching, etc.
Even with pipeline and other improvements like cache, the execution remain sequential.—Amdahl’s constraints
replacing/accelerating microprocessors
But, when should RC be used instead of alternative technologies?
Implementation Possibilities
Performance
Why not use an ASIC for everything?
0010 0010
… Processor
Processor … FPGA
Processor
Rapid prototyping
Post fabrication customization
Multi-modal computing tasks
Adaptive computing systems
Fault tolerance
High performance parallel computing
• FPGAs are sensitive to SEU (Single event upset) and SET (Single event
transients) since the configuration memory of the chip can be
affected, resulting in permanent error, due to electromagnetic noise
and radiation and particularly in space applications, cosmic rays can
hit silicon-surfaces causing high-density electron-hole pairs which
may lead to transient errors
• Requires duplication or triplication of resources for combinational
logic and parity check for on-chip caches
• Triple Modular Redundancy (TMR) with a voter circuit is common
approach. Three identical hardware modules perform their
operations in parallel and their output is voted.
• Raw Compute Power: Xilinx research shows that the Tesla P40 (40 INT8
TOP/s) with Ultrascale+TM XCVU13P FPGA (38.3 INT8 TOP/s) has almost
the same compute power, XCVU13P with high amount of on-chip cache
memory reduces the memory bottlenecks, flexibility of FPGAs in
supporting the full range of data types precisions, e.g., INT8, FTP32,
binary and any other custom data type
• Efficiency and power: an image classification project showed that Arria
10 FPGA performs almost 10 times better in power consumption, Xilinx
showed that the Xilinx Virtex Ultrascale+ performs almost four times
better than NVidia Tesla V100 in general purpose compute efficiency.