You are on page 1of 29

DSP Algorithms and Architectures

Minimodule 1. Anders Brdls Olsen abo@es.aau.dk

MM1 DSP Algorithms and Architectures

P2-18 DSP Algorithms and Architectures


Purpose
The purpose of the course is to aid the student in getting an understanding of the concepts needed in order to map with a good interaction a DSP algorithm onto a real-time architecture.

Objectives After the course the student should demonstrate:


Comprehension of basic and advance concepts in algorithmic and architectural interaction Application of methods for designing and optimizing data- and control-paths for DSP algorithms

MM1 DSP Algorithms and Architectures

The course content


Two part course 1. (ABO) The architectural aspects 2. (PK) The more algorithmic aspects Contents Design concepts for DSP Systems from specs to prototype Cost functions Area, Time, Power, Numeric, General DSP Architectures Algoritmic representation SFG, DFG, SDF, Precedence Graphs Critical path, critical loop Timing in algorithms retiming, unfolding, pipelining Look-ahead transformations Allocation, Assignment, and Scheduling Finite State Machine with Datapath Methods for Data- and Controlpath optimization Memory management in real-time DSP systems
MM1 DSP Algorithms and Architectures 3

Course practicalities
Literature:
Gajski book I will find additional reading in form of papers

Course Web

(http://kom.aau.dk/~abo/Teaching/DSP_alg_arch/index.htm)

MM1 DSP Algorithms and Architectures

Topics of today From functionality to silicon An introduction


Motivation for application specific architecture Cost functions (Noise, Power, Area, Time, ) Representing a design Abstraction levels

MM1 DSP Algorithms and Architectures

The First Computer

The Babbage Difference Engine (1832) 25,000 parts cost: 17,470


MM1 DSP Algorithms and Architectures 6

ENIAC - The first electronic computer (1946)

MM1 DSP Algorithms and Architectures

Intel 4004 Micro-Processor

1971 ~2000 transistors <1 MHz operation ~12mm2

1947
MM1 DSP Algorithms and Architectures

Todays processors
2008 > 300M transistors > 3000 MHz operation ~150mm2

MM1 DSP Algorithms and Architectures

The A3 Paradigm
Application
LP-filter (specification)

Algorithm
FIR IIR (parallel)

Architecture
DSP -Controller ASIC/FPGA

IIR (cascade) Design dedicated architectures that fits our algorithmic demands. CAD tools typical help us, but we need to know why and how

MM1 DSP Algorithms and Architectures

10

The A3 Paradigm
Application
LP-filter (specification)

Attributes Numerical properties Algorithm

Specifications

1:many mapping Architecture


DSP -Controller ASIC

FIR IIR (parallel) IIR (cascade)

This course

Attributes size, execution time


MM1 DSP Algorithms and Architectures

11

Design representation

MM1 DSP Algorithms and Architectures

12

Design Abstraction Levels


SYSTEM

MODULE + GATE

CIRCUIT

DEVICE G S n+ D n+

MM1 DSP Algorithms and Architectures

13

The design process


Top-down design strategies
Refine Specification successively Decompose each component into small components Lowest-level primitive components Over-sold methodology - only works with plenty of experience

Bottom-up design strategies


Build-up from primitive components Combined to form more complex components Risk wrong interpretation of specifications

Mixed strategies
Mostly top-down, but also bits of bottom-up Reality: need to know both top level and bottom level constraints

MM1 DSP Algorithms and Architectures

14

Typical signal processing algorithms


Sampler (quantizer)
1001101000 101001

Digital Signal Processor

1010101011 101011

Analog Reconstructor

Typical filter operations X(n)

RE AL -TIM

Y(n)

FIR:

Additions and Products (Control)

IIR: Vector, Matrix: y=Mx


MM1 DSP Algorithms and Architectures 15

General architectures
-controllers General Purpose Processors (GPP) Application Specific Instruction-set Processor (ASIP) Digital Signal Processors (DSP) Application Specific Integrated Circuit (ASIC) Field Programmable Gate Array (FPGA)
MM1 DSP Algorithms and Architectures 16

-controllers and GPPs


Known as a Von Neumann architecture
Product calculations on a ALU! Shared instruction and data bus
Operation cyclus: C1: Instruction fetch C2: Data 1 fetch C3: Data 2 fetch C4: operation execution C5: Output data storage

MM1 DSP Algorithms and Architectures

MM1 DSP Algorithms and Architectures

Control

Mem

ALU

Computational capacity Bus capacity


17

8bit PIC controller

18

-controllers and GPPs


Introducing a multiplier in the architecture
Precision
M=N: Single precision M=2N: Double precision M>2N: Overflow precision
Computational capacity Bus capacity (Still using the same bus for instruction and data)

MUL

MM1 DSP Algorithms and Architectures

MM1 DSP Algorithms and Architectures

Control Mem

ALU
19

ARM7

20

10

Digital Signal Processors [1]


Harvard architecture Individual data and instruction busses
Fetch of instruction and data simultaneous Micro parallelism (architectural)
Program Data MUL Operation cyclus: C1: Instruction fetch C1: Data 1 C2: Data 2 C3: execution || inst fetch C4: Output data storage ALU Data Path Computational capacity Bus capacity (Two operands!)

Control Path

MM1 DSP Algorithms and Architectures

MM1 DSP Algorithms and Architectures

Mem

Control Mem

21

TMS32010

22

11

Digital Signal Processors [2]


Modified Harvard architecture
Duplicated data busses Multiple data memory banks
Operation cyclus: C1: Instruction fetch C1: Data 1 || Data 2 C2: execution || inst fetch C3: Output data storage

Data 1 Program Data 2 MUL Mem

MM1 DSP Algorithms and Architectures

MM1 DSP Algorithms and Architectures

Mem

Control

ALU

Computational capacity Bus capacity


23

Blackfin architecture

Mem

24

12

Digital Signal Processors [3]


Utilizing algorithmic and architectural properties

Using address arithmetic unit the core of the above algorithm becomes a single line of parallel instructions . . A0 += data1*data2 || A1+=data3*data4; . .
25

MM1 DSP Algorithms and Architectures

Dual-core DSP processors

MM1 DSP Algorithms and Architectures

26

13

Digital Signal Processors [4]


Question: Is it always possible to utilize two (or more) MACs? Condition: As long as the inherent algorithmic parallelism is not fully utilized, additional hardware may provide a performance optimization!

MM1 DSP Algorithms and Architectures

27

ASIC and FPGAs [1]


ASIC
Customized for a particular use, in silicon Specific combining of functional units, routed by busses.

FPGA
Customized for a particular use, using programmable logic components and programmable interconnecting busses

From an algorithmic point the design methodologies is more or less similar for the two

MM1 DSP Algorithms and Architectures

28

14

ASIC and FPGA [2]


Mapping of algorithm onto a custom design HW architecture! Example alg. 1:1 mapping (fully utilizing parallelism)
Cost: T, A

Multiplexed (HW-sharing)
Cost T, A (+ Control)
29

MM1 DSP Algorithms and Architectures

Algorithmic parallelism [1]


X[n]

Y[n] T1

Time of operation Throughput

X[n]

Ha

Hb

Hc

Hd

Y[n]

T2 = Ta+Tb+Tc+Td

The operation time of a given transfer function is obviously dependent on the algorithmic complexity, but also on the implementation technology used.
MM1 DSP Algorithms and Architectures 30

15

Algorithmic parallelism [2]


Factorization X[n]

Ha
Latch

Hb

Hc

Hd

Y[n-3]

Ha
X[n]

Partial Fraction Expansion

The latency is increased Can be parallelized

Hb Hc Hd
Y[n]

The latency is not increased Can be parallelized

Algorithmic manipulation is a very important tool when optimizing architecture designs


31

MM1 DSP Algorithms and Architectures

Representation methods of alg. [1]


Block diagram
Consists of functional blocks connected with directed edges, which represent data flow from its input block to its output block

MM1 DSP Algorithms and Architectures

32

16

Representation methods of alg. [2]


Signal-Flow Graph
Nodes: represents computations or tasks, sum all incoming signal Edges: denotes a linear transformation from the input to the output

MM1 DSP Algorithms and Architectures

33

Graphical Representations
Data Flow Graphs (DFG) Control Flow Graphs (CFG) Control Data Flow Graphs (CDFG) State Transition Graphs (STG)
nodes (or vertices) edges (or arcs)

MM1 DSP Algorithms and Architectures

34

17

Data Flow Graph


Nodes: represents computations (or functions) Edges: represents data paths (or communications) Models data dependencies: a node can perform its operation whenever data is present Data flow forms directed acyclic graph (DAG):
x1=a+b y=a*c z=x1+d x2=y-d x3=x2+c
MM1 DSP Algorithms and Architectures 35

CDFG, DFG, CFG

MM1 DSP Algorithms and Architectures

36

18

Cost Functions

MM1 DSP Algorithms and Architectures

37

Cost Functions
Implementation quality is determined by cost functions
noise, power, area, time

ai ,depends on the importance of the associated cost parameter

Noise: wordlength Power: technology Area: circuit Time: the three above

Interaction

MM1 DSP Algorithms and Architectures

38

19

Minimizing the cost function


Choice of alg. / alg. Manipulation / wordlength Extraction and utilization of inherent parallelism Number and types of execution units Scheduling
Application Algorithm

Architecture

MM1 DSP Algorithms and Architectures

39

Sources of Power Consumption


Dynamic:
Vdd
I i(t) Vin

v(t)
0 1 v(t)

t0

t1

Short Circuit:
V out I Vin V out I Vin
Vin=0

Leakage:

Ids
V out=Vdd

Vgs
Ioff

Vth

MM1 DSP Algorithms and Architectures

Vin

40

20

Controlling Energy Consumption


What control do you have as a designer? Largest contributing component to CMOS power consumption is switching power:

Pavg = n avg f cavgVdd


What control do you have over each factor? How does each effect the total Energy? (think about f)

Circuit Delay:

MM1 DSP Algorithms and Architectures

41

Energy and Power


Warning! In everyday language, the term power is used incorrectly in place of energy. Power is not energy. Power is not something you can run out of. Power can not be lost or used up. Power is not a thing, it is merely a rate. Power can not be put into a battery any more than velocity can be put in the gas tank of a car.

MM1 DSP Algorithms and Architectures

42

21

Design Representation and Abstraction levels

MM1 DSP Algorithms and Architectures

43

Design Representation
Behavioral or functional representation
Specifies the behavior or the functions of a design without any implementation information

Structural representation
Specifies the implementation of a design in terms of components and their interactions

Physical representation
Specifies the physical characteristics of the design (Blueprint for manufacturing)
44

MM1 DSP Algorithms and Architectures

22

Digital System Design


IDEA
Behavioral Design Structural Design Logic Design Physical Design Fabrication

Plain English Algorithm State machine,ALU,Regs Gate level netlist Transistor list

MM1 DSP Algorithms and Architectures

Product

45

Levels of Design Abstractions

MM1 DSP Algorithms and Architectures

46

23

Implementation Technologies

MM1 DSP Algorithms and Architectures

47

HW Design Abstraction
Processor-Memory Level RT Level Logic Gates Transistors Polygons of Silicon
MM1 DSP Algorithms and Architectures 48

Levels of Design Abstraction

24

Representation and Abstraction


Fu nc t io na l
Polygons Sticks Standard Cells Floorplan
49

St ru ct Proc. Mem. Switch ur al RT


Gate

Algorithm

RT Language

Boolean Eqn

Y-Chart

Transistor

Differential Eqn

Geometric MM1 DSP Algorithms and Architectures

Heterogeneous HW/SW Implementations


Only HW, High cost and High performance. Mixed HW-SW, Medium cost and performance.

Cost
Only SW, Low cost and Low performance.

Performance

Additionally, flexibility and tight time to market requirements favour SW implementations.


MM1 DSP Algorithms and Architectures 50

25

System-level HW-SW Co-design


IDEA Specification Components (HW,SW) HW behavior and components Memory hierarchy and mapping
MM1 DSP Algorithms and Architectures

Constraints

System-level HW-SW Co-design Interconnect and buses SW behavior, RTOS, schedule policy and processors

51

Issues in System-level HW-SW Co-design


Specification of functionality and constraints. Simulation of functionality. Components as building blocks SW processors: DSP and Micro-controllers HW co-processors: ASICs, FPGA Storage elements: Cache, Scratchpad, SRAM, DRAM Interconnection elements: Buses and arbiters Interface and I/O units: DMA, UART, D/A, A/D, Wireless communication Software platform: RTOS and scheduling
MM1 DSP Algorithms and Architectures 52

26

Issues in System-level HW-SW Co-design

Performance analysis (timing, power, area) Design and optimization (timing, power, area) Architecture selection: processing elements, memory units and inter-connect. RTOS and schedule scheme.

MM1 DSP Algorithms and Architectures

53

Design flow and Abstraction levels

MM1 DSP Algorithms and Architectures

54

27

The A3 model and design flows


Application
LP-filter (specification)

Algorithm
FIR IIR (parallel)

Architecture
DSP -Controller ASIC
ig n Des
MM1 DSP Algorithms and Architectures

IIR (cascade)

s flow
55

Summary
Algorithms and Architectures
Data path Control path Algorithmic properties

Cost functions Design flows and representations


Design representations Design abstractions

Following courses
Architectural optimization (mm2-mm3) Scheduling concepts (mm4-mm5)
MM1 DSP Algorithms and Architectures 56

28

Exercises
Gajski: 1.1, 1.4, and 1.8 Cost functions: Discuss power vs. energy optimization
Why is there a difference? How can you optimize energy, only taking the dynamic contribution into account?

Taking an outset in the paper by C.H. Wang, Algorithmic Implementation of Low-Power High Performance FIR Filtering IP Cores (Hint: only sections 1 and 2). For these exercises you should prepare a few notes such that you can present your findings next Thursday (no more than 5 minutes). Gr840: Find the various representation forms of the FIR filter used, and writ them in mathematical form and make a block-diagram representation Gr841: Discuss or verify that the data-path in figure 2 is reasonable and try to map the algorithms onto it. Gr842 Make a 1:1 mapping and propose an architecture for a four tap FIR filter

MM1 DSP Algorithms and Architectures

57

29