You are on page 1of 19

An Introduction to Reconfigurable Computing

Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003

.e. .Reconfigurable Computing… is computation on a platform with reconfigurable (i. modifiable at run-time) hardware capable of implementing application-specific algorithms and functionality on demand.

Computing Spectrum Software Fetch Decode Registers Execute / + Memory Writeback + x xor z-1 x + x xor Soft-Hardware A Hardware B C D π x result General-Purpose CPU •Easily reprogrammed •Low cost •Fundamental bottlenecks Field Programmable Gate Arrays (FPGAs) •Reconfigurable hardware •Medium cost •Speedup potential Application-Specific Integrated Circuit (ASIC) •Not modifiable •High cost •Extremely fast .

von Neumann: ENIAC “von Neumann architecture” Estrin: Fixed+Variable Structure Computer Simple PLDs Xilinx introduces first FPGA Custom Computing Machines (CCMs) FPGAs exceed million logic gates FPGAs include complex cores ENIAC Fixed+Variable CPU: ConnectingVirtex CCM: The Teramac II new Users can attachPro Xilinx computational (image Xilinx Virtex FPGA Blocks for an rapidio. Mauchly.org) Multi-Chipcourtesy ofalgorithm Module of FPGAs computational circuits to a fixed ALU .History 1945: 1945: 1960: 1970’s: 1985: 1990’s: 1999: 2002: Eckert.

Reconfigurable Computing in Modern HPC • Stand-alone platforms – OctigaBay 12K – SRC-6 – Starbridge Hypercomputer • Accelerator cards – Timelogic’s DeCypher – Nallatech’s BenNUEY – Annapolis Micro Systems WILDSTAR II .

Example: Computational Fluid Dynamics William Smith & Austars Schnore at GE Global Research From: “Towards an RCC-based Accelerator for Computational Fluid Dynamics.” ERSA 2003 .

And now for some details… • Field Programmable Gate Arrays (FPGAs) • Common RC design techniques • Reported examples .

Field-Programmable Gate Arrays (FPGAs) • FPGAs emulate digital logic circuitry – Large array of configurable logic blocks – Internal routing through programmable interconnection network • FPGAs hold hardware configuration in SRAM – Change the digital circuitry by loading new configuration • Design approach: – User designs in hardware description language – Synthesis tools translate to logic gates – Mapping tools target specific FPGA .

Simplified Logic Block • Emulates logic function LUT Register – Thousands per chip • Lookup Table (LUT) – Holds truth table – Inputs produce outputs Register • 1-bit registers – Hold data between cycles LUT • Note: Greatly simplified .

LUT Example:1-bit Adder A 0 0 0 0 1 1 1 1 Truth Table B Cin Cout 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 Sum 0 1 1 0 1 0 0 Register A B C 0 LUT Register Cout 1 A B C 0 LUT Sum .

Routing Data between Logic Blocks LB X LB X LB X LB X LB LB LB X LB LB X LB X LB LB X LB X LB X LB LB X LB X LB X LB LB X LB X LB LB X LB LB • Need to connect logic blocks • Wires and Switchboxes – LBs connect to local wires – Switchboxes route long connections • Routing set at compile time – Performed by tools .

poorly supported FPGA Partial Configuration Image .Reconfiguration • Modern FPGAs SRAM based – Can be loaded with new circuitry Full Configuration Image • Full reconfiguration – Few megabytes of configuration – Milliseconds • Partial reconfiguration – Reprogram only a portion of chip – Reduces configuration time – Non-trivial.

Design Techniques Digital logic design techniques for exploiting FPGAs .

partial evaluation . memory.FPGAs as Computational Accelerators • Use FPGAs as soft-hardware – Port algorithm to hardware – Run inside FPGA – Reuse hardware • Techniques – Concurrency.

data-flow . but. – All tasks are always running • Raw parallelism – Units run in parallel – Example: Key breaking • Pipelining – Chain units together in series – Example: Streaming computations.1.. Concurrency • Load FPGA with multiple computational circuits – Hardware state machines are like threads.

Custom Memory Interactions • Most FPGA cards have multiple memory banks – Fetch/store multiple data values at same time – Predictable performance (as opposed to caches) – Hide address generation SRAM Bank 0 SRAM Bank 1 SRAM Bank 2 SRAM Bank 3 X X X FPGA SRAM Bank 4 .2.

Partial Evaluation • Know data constants at design time – Apply to circuits and reduce hardware – Synthesis tools perform automatically Example: 4-bit Ripple-Carry Adder Note: FPGAs unique because we can easily generate new. .3. optimized hardware configurations for each set of constants.

” Nwodoh . et. 2000 • Real-time holographic video display at 30fps – “Using field programmable gate arrays to scale up the speed of holographic video computation. 2003 • Adaptive beamforming: 20 GFLOPS – Parallel systolic array architecture – “20 GFLOPS QR processor on a Xilinx Virtex-E FPGA.” Smith & Schnore.RC Performance Examples • CFD: 23 GFLOPS sustained – “Towards an RCC-based Accelerator for Computational Fluid Dynamics.” Walke. al..

Systolic Processor Arrays. MIMD.In Summary • Reconfigurable computing uses FPGAs to emulate application-specific hardware – Achieve performance gains with dedicated hardware • It is possible to implement just about any kind of digital hardware in the FPGA. – Limited by capacity and effort – Resurrect application-specific hardware architectures – SIMD. Data-Flow… .