Professional Documents
Culture Documents
Fundamentals
Ingrid Verbauwhede
iverbauw@esat.kuleuven.be
1
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
Memory Transformations
loop merging, compaction
and Optimizations
1
References
• The origins:
• E.A. Lee, “Programmable DSP Processors,” Part I, IEEE ASSP
magazine, October 1988, pg. 4-19.
• Part II, IEEE ASSP magazine, January 1989, pg. 4-14
• Good overview:
• P. Lapsley, J. Bier, A. Shoham, E.A.Lee, “DSP Processor Fundamentals:
Architectures and Features,” IEEE Press, 1998.
3
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
Processor Components:
Instruction Memory
Processing Management
Unit Unit
4
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
2
Von Neumann machine
One memory space
Processor
Core mpy ALU
Address Bus
Data Bus
Memory
5
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
FIR implementation
x(n-1)
x(n) -1 -1 -1
Z Z Z
x(n-(N-1))
(50 TAPS)
N-1 c(0) c(N-1) X
Σ c(i)
X X X
y(n) = x(n-i)
i=0
y(n)
+ + +
3
FIR on Von Neumann
7
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
Program Data
Memory Memory
Instruction
Multiply 16 x 16 mpy
Processing
Unit Accumulate
ALU
8
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
4
Example 1: TMS320C10 (1982)
ZAC ACC=0
T (16)
LT X1 T=X1
MPY A P=AX1
Multiplier
LTA X2 ACC=AX1;T=X2
MPY B P=BX2
P (32) LTA X3 ACC=AX1+BX2;T=X3
MPY C P=CX3
MUX LTA X4 ACC=AX1+BX2+CX3;T=X4
MPY D P=DX4
APAC ACC=AX1+BX2+Cx3+DX4
ALU (32)
SACH Y1 STORE 32-BIT RESULT
5
TMS320C1x Memory and Buses
DEN Up to 8K words of
Data Data on-chip Program ROM
MEN
WE
4K words of
Data Address EPROM
16 16
and OTP available
16 8
Up to 64K words
Program Control, CPU External Program
Instruction Register Memory
Program Data
Memory Memory
Instruction
Multiply 16 x 16 mpy
Processing
Unit Accumulate
ALU
12
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
6
Same FIR: 53 cycles, 3 prog words
x(n-1)
x(n) -1 -1 -1
Z Z Z
x(n-(N-1))
N-1 (50 TAPS)
y(n) =
Σ c(i) x(n-i) c(0) X X X c(N-1) X
i=0
y(n)
+ + +
TMS320C10 TMS320C25
LT LTD RPTK 49 LT
DMOV MPY MACD DMOV
APAC LTD APAC
MPY 53 Cycles MPY
LTD 3 Words Prog Memory
..
. 100 Cycles
MPY
100 Words Prog Memory
13
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
Example: MACD
Executes (simplified):
7
Single Cycle MAC
TMS320C2x Multiplier/ALU
Program Bus
Single Cycle 16x16 bit
Data Bus 16
16 16 16 Multiply yielding a
Left T Register (16) MUX 32-bit product
Shifter 16
(0-16) 16
Multiplier (16x16)
32 Supports simultaneous
P Register (32) Program and two Data
32
Left Shifter (0-16) Operand acquisition
32 32
MUX Supports simultaneous
32
32
ALU and Multiplier
Arithmetic Logic Unit (ALU)
32 operations
C Accumulator Register (32)
32 0-16 bit Left Post-Shifter
16
Left Shifter (0-7)
Courtesy: Texas Instruments 15
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
1986:
80/100ns instruction cycle time
Simultaneous single-cycle Multiply/ALU operations
Zero overhead repeat single instruction
64K words of off-chip Data RAM
Optimizing ANSI C-Compiler
544 words of on-chip Data/Program RAM
Multiplier Post Shifter and enhanced Accumulator Post Shifter
74 additional instructions
- Single-cycle MAC and zero overhead repeat
- Long immediate and carry bit support
- More logical and conditional branch operations
- Data block move support
Bit reversed addressing for FFTs
Eight auxiliary registers
Hardware wait states
DMA support
Idle and Powerdown Capability
8
Other memory configurations
Instruction cache
• single instruction RPTK (repeat in TMS320C2x))
• a few instructions (up to 15 in AT&T 16A)
• ALWAYS under programmers control!
• ALWAYS known at compile time!
17
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
18
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
9
Block Diagram (C54x)
• Memory Access
– 4 internal bus pairs
– C,D for data read
– E for data write
– P for program
• Others
– 2 40-bit Accum.
– 40-bit Barrel shifter
– 40-bit ALU
– 17bx17b multiplier
and 40b dedicated
adder perform a
non pipelined
single-cycle MAC
19
H5H4, Spring 2008, Ingrid Verbauwhede, les 7-2
10