You are on page 1of 12

Analysis of computation required for FIR filter

Can you write the expression for 8-tap FIR


filter ?

Y[n] = a0 X[n]+ a1 X[n-1]+ a2


X[n-2]+ -- - - - +a7X[n-7]

Most recurring computation is


multiplication and then
accumulation (MAC)
My voice signal is to be processed using a 8 tap digital filter. Assume voice signal
frequency is 7KHz.

Find the number of samples to be processed per second?

Find the number of multiplications and additions required to obtain one output sample?

Find the number of multiplications and additions required to obtain output for one sec
duration?
DSP~GPP (General Purpose Processor)

• Real time throughput • No real time


requirement throughput needed
• Used in embedded • Desktop computing
application.
• To support DSP • No special features.
computation like FFT,
convolution, special
features are provided.
• Have MAC unit
What is the best suitable architecture for DSP?

Architectural evolution:
Von Neumann

Called as Von Neumann architecture.


Designed by: John Von Neumann, an American mathematician.
Single memory shared by both the program instructions and data.
Most computers today are of the Von Neumann design.
How many cycles needed for MAC instruction for two
numbers that reside in external memory?
1. Get the opcode of instruction.
2. Get data1
3. Get data2
4. Multiply and accumulate and store result.
(Assume that CPU computation takes very small time in
comparison to memory access)
So need four cycles.
Harvard architecture

•Developed at Harvard University (1940)

•Program instructions and data can be fetched at the same time.

•Increasing overall processing speed

•Most present day DSPs use this dual bus architecture.

•Ex: ADSP-21xx and AT&T's DSP16xx.


Cycles needed for MAC instruction in Harvard
architecture?

1. Instruction 1 fetched.
2. Instruction 1 decode and get data1 from DM and coefficient
from PM
3. Perform MAC operation and store result in DM as well as
fetch Instruction 2 from PM.
4. Instruction 2 decode get data1 from DM and coefficient
from PM
5. Perform MAC operation and store result in DM (for inst 2)
as well as fetch Instruction 3 from PM.
So single MAC operation need 3 cycles
Modified Harvard architecture

•Three memory banks


•How many memory access simultaneously possible?
•Allow three independent memory accesses per instruction cycle.
•Processors based on a three-bank modified Harvard architecture
include the Zilog Z893xx, Motorola DSP5600x, DSP563xx
Multiple-Access Memories
Using fast memories that
support multiple, sequential
accesses per instruction cycle
over a single set of buses
OR
Using multi-ported memories
that allow multiple concurrent
memory accesses over two or
more independent sets of buses.

This arrangement provides one program memory access


and two data memory accesses per instruction word.
Ex: Motorola DSP561xx processors.
Super Harvard Architecture (SHARCH DSP)

Part of program memory is used as data


memory.
Including an instruction cache in the CPU.
In a program, which part is executed
repeatedly?
The first time through a loop, slower operation
Next executions of the loop will be faster
This means that all of the memory to CPU
information transfers can be accomplished in a
single cycle.
EX: ADSP-2106x and new ADSP-211xx
Enhanced DSP architectures:
Very Long Instruction Word (VLIW) architecture:

VLIW CPUs have four to eight


execution units.
One VLIW instruction encodes
multiple operations.
EX:if a VLIW device has four
execution units, then a VLIW
instruction for that device would
have four operation fields.

VLIW instructions are usually at least 64 bits in width.


VLIW CPUs use software (the compiler) to decide which
operations can run in parallel.
Hardware's complexity for instruction scheduling is reduced.
EX: TMS320 C6xx
Endians:
•Big Endian(MSB in first location)
•Little endian
How 12345678 will be stored in four
location starting from 4000 in each
case?
TI DSP: Little endian
Motorola DSP: Big endian

You might also like