You are on page 1of 48

Reconfigurable Computing

CS G553

Dr. A. Amalin Prince


BITS - Pilani K K Birla Goa Campus
Department of Electrical and Electronics Engineering

‹#›
Lecture – 2,3,4
Introduction: Motivation, Goals, etc…

CS G553 2
Introduction
 Research in computer (processor) architecture
o The investing goals vary according to
• Target applications
• Price of the final equipment
• Programmability of the system
• The environment in which processors will be deployed
• Many others

CS G553 3
Introduction
 Computer anytime anywhere (pervasive and ubiquity) …

PDA Car
PC

Home Networking

Game console Household


Body

Super Computer Medicine


Entertainment

Communication

CS G553 4
Introduction
 ... communication also.

CS G553 5
Introduction
 Explosive growth in
o Computing
o Communication

 Information technology
o ʺHand in hand com“ growth in computing and communication

 Millions of computer systems are produce every year


o PCs, Laptops, Workstations, Mainframes, Server

 Billions of embedded computer systems already deployed


o Household, cars, mobile phone, plane, etc…

CS G553 6
Performance VS Cost

CS G553 7
Computing Paradigms

 The Von Neumann Computer

 Domain specific processors

 Application specific instructionset processors

 Application specific processors or ASICs

 Reconfigurable Processors

CS G553 8
The Von Neumann Computer

Principle
In 1945, the mathematician Von Neumann (VN)
demonstrated in study of computation that a computer
could have a simple structure, capable of executing any
kind of program, given a properly programmed control
unit, without the need of hardware modification

CS G553 9
The Von Neumann Computer

Success story of VN computer


o Even till today
• Simplicity in programming
• Follows sequential way of human thinking

CS G553 10
The Von Neumann Computer

 Structure
o A memory for storing program and data.
• The memory consists of the word with the same length
o A control unit (control path) featuring a program counter for
controlling program execution
o An arithmetic and logic unit (ALU) also called data path for program
execution
Processor or
Memory
Central processing unit

Datapath Data

Registers Data
and
Instructions

Instruction Address
register
PC Address
register
Controllpath

CS G553 11
The Von Neumann Computer

 Coding
A program is coded as a set of instructions to be sequentially
executed

 Program execution
o Instruction Fetch (IF): The next instruction to be executed is
fetched from the memory
o Decode (D): The instruction is decoded to determine the
operation
o Read operand (R): The operands are read from the memory
o Execute (EX): The required operation is executed on the ALU
o Write result (W): The result of the operation is written back to
the memory
o Instruction execution in Cycle (IF, D, R, EX, W)

CS G553 12
The Von Neumann Computer

 Advantage:
o Flexibility: any well coded program can be executed

 Drawbacks
o Speed efficiency: Not efficient, due to the sequential program
execution (temporal resource sharing).
• Resource efficiency: Only one part of the hardware resources is
required for the execution of an instruction. The rest remains
idle.
• Memory access: Memories are about 10 time slower than the
processor
o Drawbacks are compensated using high clock speed,
pipelining, caches, instruction pre-fetching, etc.

CS G553 13
The Von Neumann Computer

 Sequential execution
 tcycle = cycle execution time
➢ One instruction needs tinstrcution = 5*tcycle
➢ 3 instructions are executed in 15*tcycle

 Pipelining:
 One instruction needs tinstrcution = 5*tcycle
o no improvement. In instruction cycle
 3 instructions need 7*tcycle in the ideal
case.
 9*tcycle on a Harvard architecture.

 → Increased throughput

 Even with pipeline and other improvements like cache, the execution remain sequential.

CS G553 14
The Von Neumann Computer
 Conclusion
o Flexible
o Each algorithm can be implemented on a VN machine only if it is coded
according to the VN rules.

ʺ The algorithm much adapt itself to the hardarwe“

 Temporal use of the same hardware for a wide variety of


applications, VN computation is often characterized as
ʺTemporal Computation“
 Can all algorithms be executed with their potential?
o Modification in VN

CS G553 15
Domain specific processors

 Goal: Overcome the drawback of the von Neumann


computer.

 Optimized Datapath for a given class of applications


 Example: DSP (Digital Signal Processors):
Signal processing applications are usually multiply
accumulate (MAC) dominated.
o Datapath optimized to execute one or many MACs in only one
cycle.
o Enhanced instructions, data and control path.
o Memory access is limited by directly processing the input dataflow

CS G553 16
CS G553 17
CS G553 18
Domain specific processors

 DSPs:
 Designed for high-performance,
repetitive, numerically intensive tasks

 In one Instruction Cycle, can do:


many MAC-operations
many memory accesses
special support for efficient looping

 The hardware contains:


One or more MAC-Units
Multi-ported on-chip and off-chip
memories
Multiple on-chip busses
Address generation unit supporting
addressing modes tailored for DSP-
applications

CS G553 19
Domain specific processors
 Conclusion
o Faster than VN (MAC in 1 cycle; but for VN 10 steps required)
o Customised according to the application domain
• If the DSP is for image processing (each pixel 8-bit for
RGB); then it cannot be used again for applications
requiring 32-bit computation

CS G553 20
Application Specific Instructionset
Processor
 An ASIP is a processor that can be specialized to a
particular application domain
o Adding new instructions
o Extending the processor Datapath
 Example
o ASIP for Image processing
o ASIP for mobile phone

CS G553 21
Application specific processors or ASIC

 Optimize the complete circuit for a given function


 Example: ASIC: Application Specific Integrated Circuit.
o Optimization is done by implementing the inherent parallel
structure on a chip
o The data path is optimized for only one application.
o Instruction fetching and decoding overhead is removed
o Memory access is limited by directly processing the input data flow
o Exploitation of parallel computation

CS G553 22
Application specific processors or ASIC

 ASIC Example: ASIC implementation:


The complete execution is done in
 Implementation of a VN computer parallel in one clock cycle
if (a < b) then run-time = tclock= delay longest path
{
d = a+b;
from input to output
c = a*b; a b
}
else
{
d = a+1;
c = b-1; Compare
}

 At least 3 instructions
Mult Add Decr. Incr.

 run-time >= 3*tinstruction


 3×5×tcycle=15 tcycle MUX MUX

The VN computer needs to be clocked


at least 15 times faster c d

CS G553 23
Application specific processors ASIC

 Conclusion
o ASIC uses a spatial approach to implement only one application
o The functional units needed for the computation of all parts of the
application must be available on the surface of the final processor.
o This kind of computation is called ʺSpatial Computation“
o Highly efficient (Parallel computing)
o No flexibility

CS G553 24
Overall Conclusion
 Von Neumann computer:
General purpose, used for any kind of function.
High degree of flexibility.
However, high restrictions on the program coding and execution scheme
the program have to adapt to the machine
 DSPs are Adapted for a class of applications.
Flexibility and efficiency only for a given class of applications.
 ASIPs are Adapted for a class of tailored applications.
Flexibility and efficiency only for a given class of applications.
 ASICs are
Tailored for one application.
Very efficient in speed and resource.
Cannot re-adapt to a new application
Not flexible

CS G553 25
Conclusion

General Domain Application


Purpose Specific Specific

Max Flexibility Min Flexibility


Min Performance Max Performance

CS G553 26
Performance VS Flexibility

CS G553 27
Reconfigurable Computing

 The Ideal device should combine:


o the flexibility of the Von Neumann computer
o the efficiency of ASICs

 The ideal device should be able to


o Optimally implement an application at a given time
o Re-adapt to allow the optimal implementation of a new
application.

 We call such a device a reconfigurable device.

CS G553 28
Flexibility vs Efficiency
Flexibility

Von Neumann

General purpose
computing DSP

Domain specific
computing Reconfigurable
systems
Reconfigurable
computing

ASIC
ASIP
Application
specific
computing

Perfromance

CS G553 29
Temporal vs. spatial based computing

❑ Temporal-based execution ❑ Spatial-based execution


(software) (reconfigurable computing)

❑ Ability to extract parallelism (or concurrency) from


algorithm descriptions is the key to acceleration using
reconfigurable computing

CS G553 30
Methods for executing algorithms
Hardware Reconfigurable Software-programmed
(Application Specific computing processors
Integrated Circuits)

❑ Advantages ❑ Advantages ❑ Advantages


o very high o fills the gap between o software is very
performance and hardware and software flexible to change
efficient o much higher ❑ Disadvantages
❑ Disadvantages performance than o performance can
o not flexible (can’t be software suffer if clock is
altered after o higher level of not fast
fabrication) flexibility than o fixed instruction
o expensive hardware set by hardware
CS G553 31
Reconfigurable Computing

 Ideally, we would like to have the flexibility of the GPP and


the performance of the ASIC in the same device.
o We would like to have a device able ‘to adapt to the application’ on
the fly.
o We call such a hardware device
• a reconfigurable hardware or
• reconfigurable device or
• reconfigurable processing unit (RPU) in analogy the Central
Processing Unit (CPU)

CS G553 32
Reconfigurable Computing

 Definition: Reconfigurable computing can be defined as the


study of computations involving reconfigurable devices.
This includes, architecture, algorithms and applications.
o Spatial structure of the device will be modified such as to use the
best computing approach to speed up that application
o For an application, the device structure will be modified again to
match the new application

 Definition: Configuration respectively reconfiguration is the


process of changing the structure of a reconfigurable device
at star-up-time respectively at run-time

CS G553 33
Some Fields of Application

 Rapid prototyping

 Post fabrication customization

 Multi-modal computing tasks

 Adaptive computing systems

 Fault tolerance

 High performance parallel computing

CS G553 34
Rapid prototyping

 Testing hardware in real conditions


before fabrication
o Software simulation
• Relatively inexpensive
• Slow
• Accuracy ? APTIX System Explorer
o Hardware emulation
– Hardware testing under real operation conditions
– Fast
– Accurate
– Allow several iterations

ITALTEL FLEXBENCH

CS G553 35
Post fabrication customization

 Time to market advantage Manufacturer


o Ship the first version of a product
o Remote upgrading with new product
versions
o Remote repairing
• Mars rover vehicle (Mars Pathfinder
launched 4th July 1997)

functions can be
executed on the fly
during system
debugging

CS G553 36
Multi-modal computing tasks

 Reconfigurable vehicles,
mobile
o phones, etc..
o Built-in Digital Camera
o Video phone service
o Games service request
o Internet
o Navigation system Configuration
o Emergency
o Diagnostics
o Different standard and
protocols
o Monitoring
o Entertainment

CS G553 37
Adaptive computing systems
Computing systems that are able to adapt
their behaviour and structure to changing
operating and environmental conditions,
time-varying optimization objectives, and
physical constraints like changing
protocols,
new standards, or dynamically changing
operation conditions of technical systems.

Dynamic adaptation to environment


Dynamic adaptation to threats (DARPA)
Extended mission capabilities

CS G553 38
Adaptive Distributed Video Processing

 Application in surveillance
o Distributed cameras
• Intelligent
• Adaptive
• Performance

o Each camera covers


a given area (Can be overlapping)

o Communication
• Data exchange
– Knowledge
– Information at boundary
• Wireless

o Self-organization
• Repositioning for better coverage

CS G553 39
Adaptive Distributed Video Processing

 Operation in normal mode


o Image understanding
• Movement detection and tracking
o Fusion of information
• Better coverage of a complete area through self-organization
o Data transmission
• Characteristics of a suspect in the covering range

 Operation on failure
o Failure detection mechanism
o Self healing mechanism
o Failure recovery
o Detection and protection against attacks

CS G553 40
High performance parallel computing
Traditional parallel implementation flow

2 1 2 3 4
Application 3

4 5 6 7

Virtual Topology Physical Topology

Exploiting reconfigurable topology


Application 1

2 3
1
4 5 6 7
2 3

4 5 6 7 Physical Topology
Virtual Topology

CS G553 41
Top View: Field-Effect Transistor

CS G553 42
The Microprocessor

 10 years of Moore’s-law progress led to the microprocessor


 Raised engineer’s productivity
 Problem-solving became programming
 Grew to billions of units/year
 Further speed gains will not be seen anymore due to
unreliability and higher variations of transistor
 Stalled progress in design methods for thirty years

CS G553 43
Microprocessor bottlenecks

CS G553 44
The future of the microprocessor

 Future Multi-Core Designs are already available, but do


have major problems:
o Shared Memory Model does not scale to hundreds of processors on
a chip
o Distributed Memory Model is difficult to program
o Power consumption and temperature are further problems
 Reconfigurable Processors, Networks, and Memories on a
Chip may be the solution…

CS G553 45
The main question

 Since a reconfigurable device is a piece of hardware and


since a hardware can never change after fabrication

How is a reconfigurable device made?

➔ More on this in the coming lecturers

CS G553 46
CS G553 47
The End

 Questions ?

 Thank you for your attention

CS G553 48

You might also like