Professional Documents
Culture Documents
Register window
Delayed branches
Local
Output Input
Local
Output
Delayed Branching
br br
sub nop
add nop
sub
add
Soft Core – IP Processors
32- bit RISC
Pipelined – 3 stage
• Fetch
• Decode
• Execute
Von-Neumann Architecture
Data is aligned
• 32-bit ARM
• 16-bit THUMB – Instruction Compression
ARMv4 ARMv5TE ARMv6 ARMv7-A
ARMv7-M
Features ARM7TDMI ARMcortexM3
Arch ARMv4T (Princeton) ARMv7M (Harvard)
Int-Latency 24-42 24
Operating
Modes Switching
Exception
handling
is done – done in
BX ARM
state
Memory
Format • Bi- Endian
• Default – Little Endian
7 modes of operation
Interrupt
Operating
Modes Supervisor mode – protected mode –OS
Abort Mode
Undefined Mode
ARM has 37 registers
31 – GPRS, 6 SR
r0 - r15
CPSR – CCR
r9 r8_fiq r8 r8 r8 r8
r10 r9_fiq r9 r9 r9 r9
High
r11 r10_fiq r10 r10 r10 r10
r12 Regs
r11_fiq r11 r11 r11 r11
r13
r12_fiq r12 r12 r12 r12
r14
r13_fiq r13_svc r13_abt r13_irq r13_und
r15(PC)
r14_fiq r14_svc r14_abt r14_irq r14_und
CPSR CPSR CPSR CPSR CPSR CPSR
r15(PC)
SPSR_fiq r15(PC)
SPSR_svc r15(PC)
SPSR_abt r15(PC)
SPSR_irq r15(PC)
SPSR_und
User FIQ Supervisor Abort IRQ Undefined
r0 r0 r0 r0 r0 r0
r1 r1 r1 r1 r1 r1
r2 r2 r2 r2 r2 r2
r3
r3 r3 r3 r3 r3
r4
r4 r4 r4 r4 r4
r5
r5 r5 r5 r5 r5
r6
r6 r6 r6 r6 r6
r7
r13 r7 r7 r7 r7 r7
CPSR r15(PC)
CPSR r15(PC)
CPSR r15(PC)
CPSR r15(PC)
CPSR r15(PC)
CPSR
SPSR_fiq SPSR_svc SPSR_abt SPSR_irq SPSR_und
User FIQ Supervisor Abort IRQ Undefined
- - - - - - E A
M4- M0 Mode
10000 User - - - - GE3 GE2 GE1 GE0
10001 FIQ
10010 IRQ 31 24
10011 Spvr N Z C V Q - - J
10111 Abort
11011 Undfnd All insts in ARM state are executed
11111 System conditionally
ARM –
Instruction Set
Addressing Modes
Transfer
between STR/STRB/STRH/STRSH
Memory
&
Register
ADR
x = (a+b)-c;
ADR r4,a
LDR r0,[r4]
ADR r4,b
LDR r1,[r4]
ADD r3,r0,r1
ADR r4,c
LDR r2,[r4]
SUB r3,r3,r2
ADR r4,x
STR r3,[r4]
31-28 27 26 25 24 23 22 21 20 19-16 15-0
Examples
LDMIA R0, {R5 - R8}
STMDA R1!, {R2, R5, R7 - R9, R11}
STMFD R13!, {R2-R9}
STMEA R13!, {R2-R9}
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 EA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 EA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 EA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 EA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FD
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 FA
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 ED
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 ED
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 ED
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
0x1FC
0x200
0x204 ED
0x208
0x20C
0x210
0x214
0x218
0x21C
0x220
Branch instructions
conditional branch forwards /backwards up to 32MB
branch /jump can also be generated by writing a value to R15
31-28 27 26 25 24 23-0
If (a < b) {
x = 5;
y = c + d;
}
Else y = c - d;
ADR r4,a
LDR r0,[r4],#4
Implementation -1 LDR r1,[r4],#4
CMP r0,r1
BGE fbl
LDR r0,[r4],#4
LDR r1,[r4],#4
ADD r0,r1,r0
MOV r2,#5
STR r2,[r4],#4
STR r0,[r4]
B aft
fbl: LDR r0,[r4],#4
LDR r1,[r4],#8
SUB r0,r0,r1
STR r0,[r4]
aft:
ADR r4,a
LDR r0,[r4],#4
LDR r1,[r4],#4
CMP r0,r1
LDRLT r0,[r4],#4
LDRLT r1,[r4],#4
ADDLT r0,r1,r0
Implentation-2 MOVLT
STRLT
r2,#5
r2,[r4],#4
STRLT r0,[r4]
fbl: LDRGE r0,[r4],#4
LDRGE r1,[r4],#8
SUBGE r0,r1,r0
STRGE r0,[r4]
aft:
ARMv4 Pipeline
• 3 stages of pipeline
• Fetch-Decode-Execute
Pipeline in • Normal Instruction require three
arm clock cycles – instruction exec latency
• 1 cycle/instruction does not hold
good for all instruction
• Multiple load instruction
• Several cycles during execution phase
add r0,r1,r2 fetch decode execute
sub r2,r3,r6 fetch decode execute
cmp r5,r4 fetch decode execute
Structural stall
stall
2 holes
bne nxt fetch decode execute
nop fetch decode execute
nop
fetch decode execute
sub r2,r3,r6
-----
-----
nxt: add r0,r7,#3 fetch decode execute
Delayed Branching
ARM 7 Operating modes
Exceptions
Exception Entry
• Preserves the address of next instruction into LR
• Copies the CPSR into SPSR
• CPSR modes bits set to appropriate operating mode
r0
r1
r2
r3
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10 r10_fiq
r11 r11_fiq
r12 r12_fiq
r13 r13_fiq
back-up
r14
r14_fiq
r15(PC)
CPSR
back-up CPSR
SPSR_fiq
User FIQ
Main Program
ADR r4,a
LDR r0,[r4]
LDR r4,[r2] FIQ
LDR r1,[r4]
ADD r3,r0,r1
r15 r14_fiq
CPSR SPSR_fiq
CPSR (Mode) - FIQ
Enters FIQ
r14_fiq - 4 r15
SPSR_fiq CPSR
• nFIQ pin –low
• ARM checks for low level at the end of
each instruction
• FIQ – disabled in privileged mode – F -1
FIQ • Actions Taken on FIQ
• R14_fiq = Address of next inst to be
executed +4
• SPSR_fiq = CPSR
• CPSR[4:0] = FIQ mode
• T bit = 0 F = 1 I = 1
• PC= Address of FIQ ISR
• Exit From FIQ
• SUBS PC, R14,#4
• nIRQ pin –low
• Actions Taken on IRQ
• R14_irq = Address of next inst to
IRQ be executed +4
• SPSR_irq = CPSR
• CPSR[4:0] = IRQ mode
• T bit = 0 I = 1
• PC = Address of IRQ ISR
• Exit From IRQ
• SUBS PC, R14,#4
• Current instruction cannot be completed
• ABORT i/p checks for abort at the end of
every memory access
• Allows implementation of demand-paged
virtual memory system
Abort • Proc is allowed to generate arbitrary
address
• Data at address not available - MMU
generates abort
Physical Memory
0
0 2 v
2 A
1 i
C
2 4 v
A B C
3 i D E F
4 i 7 F
5 7 v
Disk
Page Table
40 ADR R1,A
A: ADR R1,A
44 ADR R2,B
ADR R2,B
48 LDR R0,[R1] LDR R0,[R1]
4C LDR R1,[R2] LDR R1,[R2]
50 ADR R7,D
ADD R2,R1,R0
54 ADR R1,C
R8,E
58 LDR R11,[R7]
STR R2,[R1] B: ADD R2,R1,R0
ADR R1,C
STR R2,[R1]
back
• Works out the causes of abort
• Load the inst that caused the abort
Abort • Check whether the inst modifies –
Handler base reg
• determine the offset from inst
• restore base reg - original value -
applying opposite offset
• Two Types of Abort
• Pre-fetch
• Data
• ARM marks it
• Takes action only when it enters into
execute pipeline stage
Pre-fetch • If inst is not executed – fails condition
code/branch- abort does not take place
abort • Abort handler takes over in case of abort
• Exits to original state and inst is tried again
• Actions Taken on Pre-fetch Abort
• R14_abt = Address of aborted inst
+4
Pre-fetch • SPSR_abt = CPSR
abort • CPSR[4:0] = Abort mode
• T bit = 0 I = 1
• PC = Address of Abort ISR
0C ADD R1,R1,R2 DE
0F ADD R3,R1,R3
R1 – 45DECAFE 45
R0 – 0000 0004
R2-
R2 –00000000
5A341278 04 78
11
R0-
R0 –00000000
0000 0008
12
Abort
34
13
MMU
5A
AA
• Actions Taken on Data Abort
• R14_abt = Address of inst that
caused abort +8
• SPSR_abt = CPSR
Data Abort • CPSR[4:0] = Abort mode
• T bit = 0 I = 1
• PC = Address of Abort ISR
Data Abort
Exception FIQ
Priorities
IRQ
Minimum Latency