Professional Documents
Culture Documents
03 ARM Processor Architecture
03 ARM Processor Architecture
address register
– 2 read ports, 1 write ports, access
P
any register
C incrementer
– 1 additional read port, 1 additional
PC write port for r15 (PC)
register
bank
Barrel Shifter
instruction
decode
– Shift or rotate the operand by any
A multiply &
number of bits
L
ALU
register
U control
A B
b
u
s
b
u
s barrel
b
u
s
Address register and
shifter
incrementer
ALU Data Registers
– Hold data passing to and from
memory
data out register data in register
Instruction Decoder and
D[31:0] Control
SOC Consortium Course Material 4
3-Stage Pipeline (1/2)
Fetch
– The instruction is fetched from memory and placed in the instruction pipeline
Decode
– The instruction is decoded and the datapath control signals prepared for the
next cycle
Execute
– The register bank is read, an operand shifted, the ALU result generated and
written back into destination register
increment increment
Rd PC Rd PC
registers registers
Rn Rm Rn
mult mult
as ins. as ins.
as instruction as instruction
[7:0]
increment increment
PC Rn PC
registers registers
Rn Rd
mult mult
lsl #0 shifter
=A / A+ B / A- B =A + B /A - B
[11:0]
(a) 1st cycle - compute address (b) 2nd cycle - store data & auto-index
increment increment
R14
registers registers
PC PC
mult mult
lsl #2 shifter
=A+ B =A
[23:0]
(a) 1st cycle - compute branch target (b) 2nd cycle - save return address
The third cycle, which is required to complete the pipeline refilling, is also
used to mark the small correction to the value stored in the link register
in order that is points directly at the instruction which follows the branch
SOC Consortium Course Material 10
Branch Pipeline Example
Decode
instruction
decode
register read
immediate
fields – The instruction is decoded and
LDM/
mul register operands read from the
+4
STM post-
index
shift reg
register files. There are 3 operand
shift
pre-index
execute
read ports in the register file so most
mux
ALU forwarding
paths ARM instructions can source all their
B, BL
MOV pc
operands in one cycle
Execute
SUBS pc
byte repl.
load/store
D-cache buffer/ – An operand is shifted and the ALU
data
address
result generated. If the instruction is
rot/sgn ex
LDR pc
a load or store, the memory address
is computed in the ALU
register write write-back
Write back
instruction
decode
register read
immediate
fields – The result generated by the
LDM/
mul instruction are written back to the
+4
STM post-
index
shift reg
register file, including any data
shift
pre-index
execute
loaded from memory
ALU forwarding
paths
mux
B, BL
MOV pc
SUBS pc
byte repl.
D-cache buffer/
load/store data
address
rot/sgn ex
LDR pc
Forwarding works as
pc
+4
I-cache fetch
pc + 4
follows:
pc + 8 I decode – The ALU result from the
r15
instruction
decode
EX/MEM register is always fed
register read back to the ALU input latches.
immediate
fields
– If the forwarding hardware
LDM/
STM
mul
detects that the previous ALU
post-
+4 index
shift reg
shift
operation has written the
pre-index
execute register corresponding to the
ALU forwarding
mux paths source for the current ALU
B, BL
MOV pc operation, control logic selects
SUBS pc
byte repl.
the forwarded result as the ALU
input rather than the value read
buffer/
load/store
address
D-cache
data from the register file.
rot/sgn ex
LDR pc forwarding paths
register write write-back
1 2 3 4 5 6 7 8
LDR R1,@(R2) IF ID EX MEM WB
SUB R4,R1,R5 IF ID EXsub MEM WB
AND R6,R1,R7 IF ID EXand MEM WB
OR R8,R1,R9 IF ID EXE MEM WB
1 2 3 4 5 6 7 8 9
LDR R1,@(R2) IF ID EX MEM WB
SUB R4,R1,R5 IF ID stall EXsub MEM WB
AND R6,R1,R7 IF stall ID EX MEM WB
OR R8,R1,R9 stall IF ID EX MEM WB
scan chain 2
extern0 Embedded scan chain 0
extern1
ICE
opc, r/w,
mreq, trans,
mas[1:0]
A[31:0] processor other
core signals
Din[31:0]
bus JTAG TAP
splitter controller
Dout[31:0]
ARM710T ARM720T
– 8K unified write through cache – As ARM 710T but with WinCE
support
– Full memory management unit
supporting virtual memory ARM 740T
– 8K unified write through cache
– Write buffer
– Memory protection unit
– Write buffer
8 ARM10TDMI
Core Organization
PC instructions
memory integer
– The prefetch unit is responsible for (double-
bandwidth) read data unit
fetching instructions from memory and
CPinst. CPdata
buffering them (exploiting the double write data
bandwidth memory)
coprocessor(s)
– It is also responsible for branch prediction
and use static prediction based on the
branch prediction (backward: predicted
‘taken’; forward: predicted ‘not taken’)
SOC Consortium Course Material 42
Pipeline Organization
inst. decode
decode
register read
coproc
data multiplier
ALU/shifter execute
write
pipeline
+4 mux
write
data
address
memory
read
data
forwarding rot/sgn ex
paths
write
register write
copy-back tag
– Coprocessor
copy-back data
CP15
– Write buffer
physical address
address buffer
Harvard architecture
– Increases available memory bandwidth
• Instruction memory interface
• Data memory interface
– Simultaneous accesses to instruction and data memory
can be achieved
5-stage pipeline
Changes implemented to
– Improve CPI to ~1.5
– Improve maximum clock frequency
pc + 8 I decode
r15
instruction
decode
register read
immediate
fields
mul
LDM/
STM post -
+4 index reg
shift shift
pre-index
execute
ALU forwarding
paths
mux
B, BL
MOV pc
SUBS pc
byte repl.
D-cache buffer/
load/store data
address
rot/sgn ex
LDR pc
ARM9TDMI:
instruction r. read data memory reg
fetch shift/ALU access write
decode
Not sufficient slack time to translate Thumb instructions into ARM instructions and
then decode, instead the hardware decode both ARM and Thumb instructions
directly
virtual DA
virtual IA
CP15
supporting virtual
addressing and
instruction data memory protection
MMU ARM9TDMI MMU
– Write buffer
physical DA
EmbeddedICE
& JTAG
physical
address tag
write
AMBA interface
buffer
copy-back DA
physical IA
AMBA AMBA
address data
ARM 940T
external – 2 × 4K caches
coprocessor
interface
– Memory protection
Unit
Protection Unit
instruction data – Write buffer
cache cache
ARM9TDMI
data address
EmbeddedICE
instructions
data
& JTAG
I address
write
AMBA interface
buffer
AMBA AMBA
address data