You are on page 1of 46

Reconfigurable Computing

CS G553

Dr. A. Amalin Prince


BITS - Pilani K K Birla Goa Campus
Department of Electrical, Electronics and Instrumentation Engineering

Lecture 1,2,3
Introduction: Motivation, Goals, etc

CS G553

Introduction
 Research in computer (processor) architecture
o The investing goals vary according to
Target applications
Price of the final equipment
Programmability of the system
The environment in which processors will be deployed
Many others

CS G553

Introduction
 Computer anytime anywhere (pervasive and ubiquity)

PDA

PC

Car

Home Networking
Game console

Household
Body

Super Computer

Entertainment

Medicine

Communication
CS G553

Introduction
 ... communication also.

CS G553

Introduction
 Explosive growth in
o Computing
o Communication

 Information technology
o Hand in hand com growth in computing and communication

 Millions of computer systems are produce every year


o PCs, Laptops, Workstations, Mainframes, Server

 Billions of embedded computer systems already deployed


o Household, cars, mobile phone, plane, etc

CS G553

Performance VS Cost

CS G553

Computing Paradigms
 The Von Neumann Computer
 Domain specific processors
 Application specific instructionset processors
 Application specific processors or ASICs
 Reconfigurable Processors

CS G553

The Von Neumann Computer


Principle
In 1945, the mathematician Von Neumann (VN)
demonstrated in study of computation that a computer
could have a simple structure, capable of executing any
kind of program, given a properly programmed control
unit, without the need of hardware modification

CS G553

The Von Neumann Computer


Success story of VN computer
o Even till today
Simplicity in programming
Follows sequential way of human thinking

CS G553

10

The Von Neumann Computer


 Structure
o A memory for storing program and data.
The memory consists of the word with the same length

o A control unit (control path) featuring a program counter for


controlling program execution
o An arithmetic and logic unit (ALU) also called data path for program
execution
Processor or
Central processing unit

Memory

Datapath

Data
Data
and
Instructions

Registers

Instruction
register

PC

Address
register

Address

Controllpath

CS G553

11

The Von Neumann Computer


 Coding
A program is coded as a set of instructions to be sequentially
executed

 Program execution
o Instruction Fetch (IF): The next instruction to be executed is
fetched from the memory
o Decode (D): The instruction is decoded to determine the
operation
o Read operand (R): The operands are read from the memory
o Execute (EX): The required operation is executed on the ALU
o Write result (W): The result of the operation is written back to
the memory
o Instruction execution in Cycle (IF, D, R, EX, W)

CS G553

12

The Von Neumann Computer


 Advantage:
o

Flexibility: any well coded program can be executed

 Drawbacks
o Speed efficiency: Not efficient, due to the sequential program
execution (temporal resource sharing).
Resource efficiency: Only one part of the hardware resources is
required for the execution of an instruction. The rest remains
idle.
Memory access: Memories are about 10 time slower than the
processor

o Drawbacks are compensated using high clock speed,


pipelining, caches, instruction pre-fetching, etc.

CS G553

13

The Von Neumann Computer


 Sequential execution


tcycle = cycle execution time


One instruction needs tinstrcution = 5*tcycle
3 instructions are executed in 15*tcycle

 Pipelining:


One instruction needs tinstrcution = 5*tcycle


o

no improvement. In instruction cycle

3 instructions need 7*tcycle in the ideal

case.
9*tcycle on a Harvard architecture.

 Increased throughput

Even with pipeline and other improvements like cache, the execution remain sequential.

CS G553

14

The Von Neumann Computer


 Conclusion
o
o

Flexible
Each algorithm can be implemented on a VN machine only if it is coded
according to the VN rules.

The algorithm much adapt itself to the hardarwe


 Temporal use of the same hardware for a wide variety of
applications, VN computation is often characterized as
Temporal Computation
 Can all algorithms be executed with their potential?
o

Modification in VN

CS G553

15

Domain specific processors


 Goal: Overcome the drawback of the von Neumann
computer.
 Optimized Datapath for a given class of applications
 Example: DSP (Digital Signal Processors):
Signal processing applications are usually multiply
accumulate (MAC) dominated.
o Datapath optimized to execute one or many MACs in only one
cycle.
o Enhanced instructions, data and control path.
o Memory access is limited by directly processing the input dataflow

CS G553

16

Domain specific processors


 DSPs:


Designed for high-performance,


repetitive, numerically intensive tasks

In one Instruction Cycle, can do:

many MAC-operations
many memory accesses
special support for efficient looping

The hardware contains:

One or more MAC-Units


Multi-ported on-chip and off-chip
memories
Multiple on-chip busses
Address generation unit supporting
addressing modes tailored for DSPapplications

CS G553

17

Domain specific processors


 Conclusion
o
o

Faster than VN (MAC in 1 cycle; but for VN 10 steps required)


Customised according to the application domain

If the DSP is for image processing (each pixel 8-bit for


RGB); then it cannot be used again for applications
requiring 32-bit computation

CS G553

18

Application Specific Instructionset


Processor
 An ASIP is a processor that can be specialized to a
particular application domain
o Adding new instructions
o Extending the processor datapath

 Example
o ASIP for Image processing

CS G553

19

Application specific processors or ASIC


 Optimize the complete circuit for a given function
 Example: ASIC: Application Specific Integrated Circuit.
o Optimization is done by implementing the inherent parallel
structure on a chip
o The data path is optimized for only one application.
o Instruction fetching and decoding overhead is removed
o Memory access is limited by directly processing the input data flow
o Exploitation of parallel computation

CS G553

20

Application specific processors or ASIC


ASIC implementation:
The complete execution is done in
 Implementation of a VN computer parallel in one clock cycle
if (a < b) then
run-time = tclock= delay longest path
{
from input to output
d = a+b;

 ASIC Example:

c = a*b;
}
else
{

d = a+1;
c = b-1;
}

 At least 3 instructions
 run-time >= 3*tinstruction
 35tcycle=15 tcycle
The VN computer needs to be clocked
at least 15 times faster
CS G553

21

Application specific processors ASIC


 Conclusion
o ASIC uses a spatial approach to implement only one application
o The functional units needed for the computation of all parts of the
application must be available on the surface of the final processor.
o This kind of computation is called Spatial Computation
o Highly efficient (Parallel computing)
o No flexibility

CS G553

22

Overall Conclusion
 Von Neumann computer:
General purpose, used for any kind of function.
High degree of flexibility.
However, high restrictions on the program coding and execution
scheme
the program have to adapt to the machine

 DSPs are Adapted for a class of applications.


Flexibility and efficiency only for a given class of applications.

 ASIPs are Adapted for a class of tailored applications.


Flexibility and efficiency only for a given class of applications.

 ASICs are
Tailored for one application.
Very efficient in speed and resource.

Cannot re-adapt to a new application


Not flexible
CS G553

23

Conclusion

General
Purpose

Domain
Specific

Application
Specific

Min Flexibility
Max Performance

Max Flexibility
Min Performance

CS G553

24

Performance VS Flexibility

CS G553

25

Reconfigurable Computing
 The Ideal device should combine:
o the flexibility of the Von Neumann computer
o the efficiency of ASICs

 The ideal device should be able to


o Optimally implement an application at a given time
o Re-adapt to allow the optimal implementation of a new
application.

 We call such a device a reconfigurable device.

CS G553

26

Flexibility

Flexibility vs Efficiency
Von Neumann
General purpose
computing

DSP
Domain specific
computing

Reconfigurable
systems
Reconfigurable
computing

ASIC
ASIP
Application
specific
computing

Perfromance
CS G553

27

Temporal vs. spatial based computing


 Temporal-based execution
(software)

 Spatial-based execution
(reconfigurable computing)

 Ability to extract parallelism (or concurrency) from


algorithm descriptions is the key to acceleration using
reconfigurable computing

CS G553

28

Methods for executing algorithms


Hardware
(Application Specific
Integrated Circuits)

 Advantages
o very high performance
and efficient
 Disadvantages
o not flexible (cant be
altered after
fabrication)
o expensive

Reconfigurable
computing

 Advantages
o fills the gap between
hardware and software
o much higher
performance than
software
o higher level of
flexibility than
hardware
CS G553

Software-programmed
processors

 Advantages
o software is very
flexible to change
 Disadvantages
o performance can
suffer if clock is
not fast
o fixed instruction
set by hardware
29

Reconfigurable Computing
 Ideally, we would like to have the flexibility of the GPP and
the performance of the ASIC in the same device.
o

We would like to have a device able to adapt to the application on


the fly.
We call such a hardware device

a reconfigurable hardware or
reconfigurable device or
reconfigurable processing unit (RPU) in analogy the Central
Processing Unit (CPU)

CS G553

30

Reconfigurable Computing
 Definition: Reconfigurable computing can be defined as the
study of computations involving reconfigurable devices.
This includes, architecture, algorithms and applications.
o Spatial structure of the device will be modified such as to use the
best computing approach to speed up that application
o For an application, the device structure will be modified again to
match the new application

 Definition: Configuration respectively reconfiguration is the


process of changing the structure of a reconfigurable device
at star-up-time respectively at run-time

CS G553

31

Some Fields of Application


 Rapid prototyping
 Post fabrication customization
 Multi-modal computing tasks
 Adaptive computing systems
 Fault tolerance
 High performance parallel computing
CS G553

32

Rapid prototyping
 Testing hardware in real conditions
before fabrication
o Software simulation
Relatively inexpensive
Slow
Accuracy ?

APTIX System Explorer

o Hardware emulation
Hardware testing under real operation conditions
Fast
Accurate
Allow several iterations

ITALTEL FLEXBENCH

CS G553

33

Post fabrication customization


 Time to market advantage
o Ship the first version of a product
o Remote upgrading with new product
versions
o Remote repairing

Manufacturer

Mars rover vehicle (Mars Pathfinder


launched 4th July 1997)

functions can be
executed on the fly
during system
debugging

CS G553

34

Multi-modal computing tasks


 Reconfigurable vehicles,
mobile
o
o
o
o
o
o
o
o
o

phones, etc..
Built-in Digital Camera
Video phone service
Games
Internet
Navigation system
Emergency
Diagnostics
Different standard and
protocols
o Monitoring
o Entertainment

service request
Configuration

CS G553

35

Adaptive computing systems


Computing systems that are able to adapt
their behaviour and structure to changing
operating and environmental conditions,
time-varying optimization objectives, and
physical constraints like changing
protocols,
new standards, or dynamically changing
operation conditions of technical systems.

Dynamic adaptation to environment


Dynamic adaptation to threats (DARPA)
Extended mission capabilities

CS G553

36

Adaptive Distributed Video Processing


 Application in surveillance
o Distributed cameras
Intelligent
Adaptive
Performance

o Each camera covers


a given area (Can be overlapping)
o Communication
Data exchange
Knowledge
Information at boundary

Wireless

o Self-organization
Repositioning for better coverage
CS G553

37

Adaptive Distributed Video Processing


 Operation in normal mode
o Image understanding
Movement detection and tracking

o Fusion of information
Better coverage of a complete area through self-organization

o Data transmission
Characteristics of a suspect in the covering range

 Operation on failure
o
o
o
o

Failure detection mechanism


Self healing mechanism
Failure recovery
Detection and protection against attacks

CS G553

38

High performance parallel computing


Traditional parallel implementation flow
1
2

Application
4

3
5

Physical Topology

Virtual Topology

Exploiting reconfigurable topology


Application

1
2

1
2
4

3
5

3
5

Physical Topology

Virtual Topology

CS G553

39

Top View: Field-Effect Transistor

CS G553

40

The Microprocessor
 10 years of Moores-law progress led to the microprocessor
 Raised engineers productivity
 Problem-solving became programming
 Grew to billions of units/year
 Further speed gains will not be seen any more due to
unreliability and higher variations of transistor
 Stalled progress in design methods for thirty years

CS G553

41

Microprocessor bottlenecks

CS G553

42

The future of the microprocessor


 Future Multi-Core Designs are already available, but do do
have major problems:
o Shared Memory Model does not scale to hundreds of processors on
a chip
o Distributed Memory Model is difficult to program
o Power consumption and temperature are further problems

 Reconfigurable Processors, Networks, and Memories on a


Chip may be the solution

CS G553

43

The main question


 Since a reconfigurable device is a piece of hardware and
since a hardware can never change after fabrication

How is a reconfigurable device made ?


More in this in the coming lecturers

CS G553

44

CS G553

45

The End
 Questions ?

 Thank you for your attention


CS G553

46