Professional Documents
Culture Documents
Microprocessor Architecture Computer Architecture: Von Neumann Von Neumann vs. Harvard
Microprocessor Architecture Computer Architecture: Von Neumann Von Neumann vs. Harvard
Microprocessor Architecture
in a nutshell
Alternative approaches
Two opposite examples
SHARC
ARM7
Chenyang Lu
CSE 467S
von Neumann
Chenyang Lu
CSE 467S
address
memory
200
ADD r5,r1,r3
Chenyang Lu
200
PC
data
Harvard
CPU
ADD r5,r1,r3
IR
CSE 467S
Harvard Architecture
program memory
Chenyang Lu
data
CSE 467S
data
address
CSE 467S
address
data memory
Chenyang Lu
CPU
IR
PC
Chenyang Lu
CSE 467S
Microprocessors
RISC
ARM7
ARM9
CISC
Pentium
SHARC
(DSP)
von Neumann
Harvard
Chenyang Lu
CSE 467S
Chenyang Lu
CSE 467S
DSP Optimizations
Streaming data
Need high data throughput
Harvard architecture
Signal processing
Memory footprint
Require high code density
Need CISC instead of RISC
Real-time requirements
Chenyang Lu
CSE 467S
Chenyang Lu
SHARC Architecture
Registers
Register files
10
multiplier
shifter;
ALU.
CSE 467S
Chenyang Lu
11
Chenyang Lu
CSE 467S
12
Assembly Language
SHARC Assembly
Performance analysis
Manual optimization of critical code
Chenyang Lu
CSE 467S
13
Chenyang Lu
Computation
Data Types
Chenyang Lu
CSE 467S
15
CSE 467S
Chenyang Lu
CSE 467S
Parallel Operations
14
16
dual add-subtract;
multiplication and dual add/subtract
floating-point multiply and ALU operation
Chenyang Lu
CSE 467S
17
Chenyang Lu
CSE 467S
18
Memory Access
Parallel load/store
Circular buffer
CSE 467S
19
Chenyang Lu
CSE 467S
Load/Store
Basic Addressing
Load/store architecture
Can use direct addressing
Two set of data address generators (DAGs):
Immediate value:
R0 = DM(0x20000000);
Direct load:
program memory;
data memory.
Can perform two load/store per cycle
Direct store:
DM(_a)= R0; ! Stores R0 at _a
Chenyang Lu
CSE 467S
21
DAG1 Registers
M0
M1
L0
L1
B0
B1
I2
M2
L2
B2
I3
M3
L3
B3
I4
M4
L4
B4
I5
M5
L5
B5
I6
M6
L6
B6
I7
M7
L7
B7
CSE 467S
Chenyang Lu
CSE 467S
22
I0
I1
Chenyang Lu
20
23
Chenyang Lu
CSE 467S
24
Zero-Overhead Loop
Circular Buffer
L: buffer size
B: buffer base address
I, M in post-modify mode
I is automatically wrapped around the circular
buffer when it reaches B+L
Example: FIR filter
L:
Chenyang Lu
CSE 467S
25
26
PC Stack
CSE 467S
CSE 467S
Nested Loop
Chenyang Lu
Chenyang Lu
27
Chenyang Lu
CSE 467S
28
SHARC
CISC + Harvard architecture
Computation
S1:
S2:
Memory Access
Parallel load/store
Circular buffer
CSE 467S
29
Chenyang Lu
CSE 467S
30
Microprocessors
ARM7
RISC
ARM7
CISC
Pentium
DSP
(SHARC)
von Neumann
Harvard
ARM9
32 bit or 12 bit
Usually one instruction/cycle
Poor code density
No parallel operations
Memory access
Chenyang Lu
CSE 467S
No parallel access
No direct addressing
31
Chenyang Lu
CSE 467S
Sample Prices
; loop
MOV
MOV
LDR
MOV
ADR
ADR
ARM7: $14.54
SHARC: $51.46 - $612.74
initiation code
r0, #0 ; use r0 for loop counter
r8, #0 ; use separate index for arrays
r1, #4 ; buffer size
r2, #0 ; use r2 for f
r3, c ; load r3 with base of c[ ]
r5, x ; load r5 with base of x[ ]
32
CSE 467S
33
Chenyang Lu
CSE 467S
MIPS/FLOPS Metrics
34
Time consuming
Chenyang Lu
CSE 467S
35
Chenyang Lu
CSE 467S
36
Chenyang Lu
CSE 467S
37
Reading
Chapter 2 (only the sections related to slides)
Optional: J. Eyre and J. Bier, DSP Processors Hit the
Mainstream, IEEE Micro, August 1998.
Optional: More about SHARC
http://www.analog.com/processors/processors/sharc/
Nested loops: Pages (3-37) (3-59)
http://www.analog.com/UploadedFiles/Associated_Docs/476124
543020432798236x_pgr_sequen.pdf
Chenyang Lu
CSE 467S
39
Power consumption
Cost
Code density
Chenyang Lu
CSE 467S
38