10 views

Uploaded by Srinivasa Raonookala

- Lecture 2 Basic CPU Architectures (1)
- Organizing Engineering Research Papers(26)
- P&B-Modified Method-George_08_10_15
- mtech_syllabus
- IEEE DSP 2012-13 Project Titles
- 05743192.pdf
- Microblaze Tutorial
- Dsk 6713
- UT Dallas Syllabus for cs6398.501.08s taught by Yuke Wang (yuke)
- Cortex a9 Software Developers Errata Notice r4 UAN0009B
- An H.264-Based Solution on the DM642 for Video Broadcast Applications (H.264 white paper).pdf
- The 80386 and 80486 Microprocessor
- 22vol2no3.pdf
- Vlsi Design Syallabus Copy
- Maximizing tha Power of ARM with DSP
- JDSP-Brochure
- 06180031
- EC6502
- UNIT - IV
- Basic Assembly Language Programming Concepts

You are on page 1of 29

Purpose

The purpose of the course is to aid the student in getting an understanding of the concepts needed in order to map with a good interaction a DSP algorithm onto a real-time architecture.

Comprehension of basic and advance concepts in algorithmic and architectural interaction Application of methods for designing and optimizing data- and control-paths for DSP algorithms

Two part course 1. (ABO) The architectural aspects 2. (PK) The more algorithmic aspects Contents Design concepts for DSP Systems from specs to prototype Cost functions Area, Time, Power, Numeric, General DSP Architectures Algoritmic representation SFG, DFG, SDF, Precedence Graphs Critical path, critical loop Timing in algorithms retiming, unfolding, pipelining Look-ahead transformations Allocation, Assignment, and Scheduling Finite State Machine with Datapath Methods for Data- and Controlpath optimization Memory management in real-time DSP systems

MM1 DSP Algorithms and Architectures 3

Course practicalities

Literature:

Gajski book I will find additional reading in form of papers

Course Web

(http://kom.aau.dk/~abo/Teaching/DSP_alg_arch/index.htm)

Motivation for application specific architecture Cost functions (Noise, Power, Area, Time, ) Representing a design Abstraction levels

MM1 DSP Algorithms and Architectures 6

1947

MM1 DSP Algorithms and Architectures

Todays processors

2008 > 300M transistors > 3000 MHz operation ~150mm2

The A3 Paradigm

Application

LP-filter (specification)

Algorithm

FIR IIR (parallel)

Architecture

DSP -Controller ASIC/FPGA

IIR (cascade) Design dedicated architectures that fits our algorithmic demands. CAD tools typical help us, but we need to know why and how

10

The A3 Paradigm

Application

LP-filter (specification)

Specifications

DSP -Controller ASIC

This course

MM1 DSP Algorithms and Architectures

11

Design representation

12

SYSTEM

MODULE + GATE

CIRCUIT

DEVICE G S n+ D n+

13

Top-down design strategies

Refine Specification successively Decompose each component into small components Lowest-level primitive components Over-sold methodology - only works with plenty of experience

Build-up from primitive components Combined to form more complex components Risk wrong interpretation of specifications

Mixed strategies

Mostly top-down, but also bits of bottom-up Reality: need to know both top level and bottom level constraints

14

Sampler (quantizer)

1001101000 101001

1010101011 101011

Analog Reconstructor

RE AL -TIM

Y(n)

FIR:

MM1 DSP Algorithms and Architectures 15

General architectures

-controllers General Purpose Processors (GPP) Application Specific Instruction-set Processor (ASIP) Digital Signal Processors (DSP) Application Specific Integrated Circuit (ASIC) Field Programmable Gate Array (FPGA)

MM1 DSP Algorithms and Architectures 16

Known as a Von Neumann architecture

Product calculations on a ALU! Shared instruction and data bus

Operation cyclus: C1: Instruction fetch C2: Data 1 fetch C3: Data 2 fetch C4: operation execution C5: Output data storage

Control

Mem

ALU

17

18

Introducing a multiplier in the architecture

Precision

M=N: Single precision M=2N: Double precision M>2N: Overflow precision

Computational capacity Bus capacity (Still using the same bus for instruction and data)

MUL

Control Mem

ALU

19

ARM7

20

10

Harvard architecture Individual data and instruction busses

Fetch of instruction and data simultaneous Micro parallelism (architectural)

Program Data MUL Operation cyclus: C1: Instruction fetch C1: Data 1 C2: Data 2 C3: execution || inst fetch C4: Output data storage ALU Data Path Computational capacity Bus capacity (Two operands!)

Control Path

Mem

Control Mem

21

TMS32010

22

11

Modified Harvard architecture

Duplicated data busses Multiple data memory banks

Operation cyclus: C1: Instruction fetch C1: Data 1 || Data 2 C2: execution || inst fetch C3: Output data storage

Mem

Control

ALU

23

Blackfin architecture

Mem

24

12

Utilizing algorithmic and architectural properties

Using address arithmetic unit the core of the above algorithm becomes a single line of parallel instructions . . A0 += data1*data2 || A1+=data3*data4; . .

25

26

13

Question: Is it always possible to utilize two (or more) MACs? Condition: As long as the inherent algorithmic parallelism is not fully utilized, additional hardware may provide a performance optimization!

27

ASIC

Customized for a particular use, in silicon Specific combining of functional units, routed by busses.

FPGA

Customized for a particular use, using programmable logic components and programmable interconnecting busses

From an algorithmic point the design methodologies is more or less similar for the two

28

14

Mapping of algorithm onto a custom design HW architecture! Example alg. 1:1 mapping (fully utilizing parallelism)

Cost: T, A

Multiplexed (HW-sharing)

Cost T, A (+ Control)

29

X[n]

Y[n] T1

X[n]

Ha

Hb

Hc

Hd

Y[n]

T2 = Ta+Tb+Tc+Td

The operation time of a given transfer function is obviously dependent on the algorithmic complexity, but also on the implementation technology used.

MM1 DSP Algorithms and Architectures 30

15

Factorization X[n]

Ha

Latch

Hb

Hc

Hd

Y[n-3]

Ha

X[n]

Hb Hc Hd

Y[n]

31

Block diagram

Consists of functional blocks connected with directed edges, which represent data flow from its input block to its output block

32

16

Signal-Flow Graph

Nodes: represents computations or tasks, sum all incoming signal Edges: denotes a linear transformation from the input to the output

33

Graphical Representations

Data Flow Graphs (DFG) Control Flow Graphs (CFG) Control Data Flow Graphs (CDFG) State Transition Graphs (STG)

nodes (or vertices) edges (or arcs)

34

17

Nodes: represents computations (or functions) Edges: represents data paths (or communications) Models data dependencies: a node can perform its operation whenever data is present Data flow forms directed acyclic graph (DAG):

x1=a+b y=a*c z=x1+d x2=y-d x3=x2+c

MM1 DSP Algorithms and Architectures 35

36

18

Cost Functions

37

Cost Functions

Implementation quality is determined by cost functions

noise, power, area, time

Noise: wordlength Power: technology Area: circuit Time: the three above

Interaction

38

19

Choice of alg. / alg. Manipulation / wordlength Extraction and utilization of inherent parallelism Number and types of execution units Scheduling

Application Algorithm

Architecture

39

Dynamic:

Vdd

I i(t) Vin

v(t)

0 1 v(t)

t0

t1

Short Circuit:

V out I Vin V out I Vin

Vin=0

Leakage:

Ids

V out=Vdd

Vgs

Ioff

Vth

Vin

40

20

What control do you have as a designer? Largest contributing component to CMOS power consumption is switching power:

What control do you have over each factor? How does each effect the total Energy? (think about f)

Circuit Delay:

41

Warning! In everyday language, the term power is used incorrectly in place of energy. Power is not energy. Power is not something you can run out of. Power can not be lost or used up. Power is not a thing, it is merely a rate. Power can not be put into a battery any more than velocity can be put in the gas tank of a car.

42

21

43

Design Representation

Behavioral or functional representation

Specifies the behavior or the functions of a design without any implementation information

Structural representation

Specifies the implementation of a design in terms of components and their interactions

Physical representation

Specifies the physical characteristics of the design (Blueprint for manufacturing)

44

22

IDEA

Behavioral Design Structural Design Logic Design Physical Design Fabrication

Plain English Algorithm State machine,ALU,Regs Gate level netlist Transistor list

Product

45

46

23

Implementation Technologies

47

HW Design Abstraction

Processor-Memory Level RT Level Logic Gates Transistors Polygons of Silicon

MM1 DSP Algorithms and Architectures 48

24

Fu nc t io na l

Polygons Sticks Standard Cells Floorplan

49

Gate

Algorithm

RT Language

Boolean Eqn

Y-Chart

Transistor

Differential Eqn

Only HW, High cost and High performance. Mixed HW-SW, Medium cost and performance.

Cost

Only SW, Low cost and Low performance.

Performance

MM1 DSP Algorithms and Architectures 50

25

IDEA Specification Components (HW,SW) HW behavior and components Memory hierarchy and mapping

MM1 DSP Algorithms and Architectures

Constraints

System-level HW-SW Co-design Interconnect and buses SW behavior, RTOS, schedule policy and processors

51

Specification of functionality and constraints. Simulation of functionality. Components as building blocks SW processors: DSP and Micro-controllers HW co-processors: ASICs, FPGA Storage elements: Cache, Scratchpad, SRAM, DRAM Interconnection elements: Buses and arbiters Interface and I/O units: DMA, UART, D/A, A/D, Wireless communication Software platform: RTOS and scheduling

MM1 DSP Algorithms and Architectures 52

26

Performance analysis (timing, power, area) Design and optimization (timing, power, area) Architecture selection: processing elements, memory units and inter-connect. RTOS and schedule scheme.

53

54

27

Application

LP-filter (specification)

Algorithm

FIR IIR (parallel)

Architecture

DSP -Controller ASIC

ig n Des

MM1 DSP Algorithms and Architectures

IIR (cascade)

s flow

55

Summary

Algorithms and Architectures

Data path Control path Algorithmic properties

Design representations Design abstractions

Following courses

Architectural optimization (mm2-mm3) Scheduling concepts (mm4-mm5)

MM1 DSP Algorithms and Architectures 56

28

Exercises

Gajski: 1.1, 1.4, and 1.8 Cost functions: Discuss power vs. energy optimization

Why is there a difference? How can you optimize energy, only taking the dynamic contribution into account?

Taking an outset in the paper by C.H. Wang, Algorithmic Implementation of Low-Power High Performance FIR Filtering IP Cores (Hint: only sections 1 and 2). For these exercises you should prepare a few notes such that you can present your findings next Thursday (no more than 5 minutes). Gr840: Find the various representation forms of the FIR filter used, and writ them in mathematical form and make a block-diagram representation Gr841: Discuss or verify that the data-path in figure 2 is reasonable and try to map the algorithms onto it. Gr842 Make a 1:1 mapping and propose an architecture for a four tap FIR filter

57

29

- Lecture 2 Basic CPU Architectures (1)Uploaded byMohan Krishna Reddy Karri
- Organizing Engineering Research Papers(26)Uploaded by柯泰德 (Ted Knoy)
- P&B-Modified Method-George_08_10_15Uploaded bygeorgekm
- mtech_syllabusUploaded bykjrhfgfytegh
- IEEE DSP 2012-13 Project TitlesUploaded byncctstudentproject
- 05743192.pdfUploaded byhub23
- Microblaze TutorialUploaded byterryscon
- Dsk 6713Uploaded bysachin_bhingare
- UT Dallas Syllabus for cs6398.501.08s taught by Yuke Wang (yuke)Uploaded byUT Dallas Provost's Technology Group
- Cortex a9 Software Developers Errata Notice r4 UAN0009BUploaded bycoolkad81
- An H.264-Based Solution on the DM642 for Video Broadcast Applications (H.264 white paper).pdfUploaded byMn Raju
- The 80386 and 80486 MicroprocessorUploaded bysenthilvl
- 22vol2no3.pdfUploaded byChen CY
- Vlsi Design Syallabus CopyUploaded byprasad2211
- Maximizing tha Power of ARM with DSPUploaded byIoana Cîlniceanu
- JDSP-BrochureUploaded byAmmulu2010
- 06180031Uploaded bybrittoajaychat
- EC6502Uploaded byVenkatesh kumar
- UNIT - IVUploaded byJit Agg
- Basic Assembly Language Programming ConceptsUploaded byDev Jha
- e06175119.pdfUploaded byVinod
- A VXI Power Quality Analyser Implementing a Wavelet Transform-based Measurement ProcedureUploaded byBernardMight
- MCQforDSPUploaded byAbhishek Pandey
- lm.phpUploaded byRamesh Sky
- Application Form for NavyUploaded byNaveen Kumar
- 05 Machine BasicsUploaded bysharmavasu786
- Chapter 3 - Part II.pptUploaded byA Samuel Clement
- S2 Bidang Keahlian Elektronika.pdfUploaded byRegi Satria
- Course ContentUploaded bysharukhsshaikh
- Syllabus ElectricalUploaded byRavinder Ranga

- AC 43-9-1 Amdt 0Uploaded byHarry Nuryanto
- Beml Bd50 Bulldozer o & m ManualUploaded byKrishnakumar Bethanasamy
- Toshiba Transistor GCUploaded byYehr Pamintuan Ollabiuq
- Tc9 FAQ Teamcenter Licensing Policy V1.5Uploaded byPHANVANTU_4000
- Laptop Motherboard ComponentsUploaded byvinod
- SR_Working_07_EN.pdfUploaded byrfzebral
- TG130 Eng Manual v1 1Uploaded byMario Zurita
- GOOYUploaded byTharsis
- Calculation of Wind Peak Velocity Pressure - Eurocode 1Uploaded byManuel Lozano
- Template for Research Project.docUploaded byifiok
- Case Study University of Phoenix4Uploaded byGayathri
- Manual Do Fis Jit VwUploaded byJose Augusto Santos Neto
- ESV Issue18.pdfUploaded byricky_ksc
- w407-e1-08Uploaded byvacsaa
- Kruger Optics Catalog 2010Uploaded byNickchios
- Past Exam Ae451Uploaded bySiyar Joyenda
- EnterKomputer -.pdfUploaded byFadGaps
- Chemistry Markscheme (All in One)Uploaded byAliMushtaq
- Heos7hs2 Eng PDF Om v00aUploaded byAnonymous 7n2KGpe
- fmw-1212certmatrix-1970069.xlsUploaded byprasad
- Bricklayers Tool Kit.pdfUploaded byYounes Baraka
- ER to Relational MappingUploaded byAbdullahAlNoman
- Za Protok ElektromagnetUploaded bywarmaster81
- Engineering ManagementUploaded byKIER JEXTER BAGUNU
- Sawdust FiringUploaded byIvan Semenov
- TablesUploaded bykurrysuchit
- IMS V7 Appl Programming DBUploaded byapi-3709524
- MORENO VALLEY: Report on World Logistics Center project aestheticsUploaded byThe Press-Enterprise / PE.com
- How to Install SecurePlatform_Gaia From a USB Device on Check Point Appliance and Open ServersUploaded byshikhaxohebkhan
- Cooper RetrofitUploaded byAlexander Checya