You are on page 1of 65

ARM Processor

This is the ARM7 family processor which has T= Thumb instruction set, D= Debug Unit, M= MMU(Memory Management Unit), I= Embedded Trace core.


ARM9xx core


Enhanced Instruction set

ARM has 3 instruction set states


32-bit ARM instruction set

16-bit Thumb instruction set 8- bit Jazelle instruction set



ARM 32 bit Load/Store architecture with every instruction being conditional.

Thumb- 16 bit with only branch instructions being conditional and only half of the registers used Jazelle- Allows Java byte code to be directly executed in ARM architecture. Improves performance by 5x-10x

ARM- Processor Modes

Seven basic operating modes exist:


User: Unprivileged mode under which most tasks run FIQ: Entered when a high priority interrupt is raised



IRQ: Entered when a low priority interrupt is raised

Supervisor: Entered on reset and when a software Interrupt instruction is executed



Abort: Used to handle memory access violations

Undef: Used to handle undefined instructions System: Privileged mode using the same registers as user mode.



Register Organization Summary

r0 r1 r2 r3 r4 r5 r6 r7 User mode r0-r7, r15, and cpsr






r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr

r9 r10 r11 r12 r13 (sp) r14 (lr)

User mode r0-r12, r15, and cpsr

User mode r0-r12, r15, and cpsr

User mode r0-r12, r15, and cpsr

User mode r0-r12, r15, and cpsr

Thumb state Low registers

Thumb state High registers

r13 (sp) r14 (lr) r13 (sp) r14 (lr) r13 (sp) r14 (lr) r13 (sp) r14 (lr)






Note: System mode uses the User mode register set

ARM- The Registers

ARM has 37 registers all of which are 32-bits long.

1 dedicated program counter 1 dedicated current program status register

5 dedicated saved program status registers

30 general purpose registers

The current processor mode governs which of several banks is accessible. Each mode can access

a particular set of r0-r12 registers a particular r13 (the stack pointer, sp) and r14 (the link register, lr) the program counter, r15(pc) the current program status register, cpsr

Privileged modes (except System) can also access

a particular spsr (saved program status register)


Contiguous storage elements that hold data. The address bus on the ARM7TDMI consists of 32 bits, meaning that one could address bytes in memory from address 0 to 4,294,967,295 (4GB) of memory space. RISC processors use separate Load/Store instructions for accessing memory.

Load & Store Instructions

Single Register Transfer

Instructions for moving single data item in and out of a register.

The data types supported are signed and unsigned words (32 bit), half-words (16-bit) and bytes.
LDR and STR instructions can load and store data on a boundary alignment that is the same as the data type size being loaded or stored. For example: LDR can only load 32-bit word on a memory address that is a multiple of four bytes 0, 4, 8 and so on.

Single Register Load-Store Addressing Modes

ARM instructions set provides different modes for addressing memory. These modes incorporate one of the indexing methods:

Preindex with writeback: Calculates an address from a base register plus address offset and then offsets that address base register with the new address. Preindex: same as the preindex with write back but does not update the address base register.

Postindex: only updates the address base register after the address is used.

pre-indexing is useful for accessing an element in a data structure.

Post-index and preindex with writeback modes are useful for traversing an array.

Immediate means the address is calculated using the base address register and a 12-bit offset encoded in the instruction.

Registers mean the address is calculated using the base address register and a specific register's content.
Scaled means the address is calculated using the base address register and a barrel shift operation.

Loading Bytes & Half-Words

LDRH r11, [r0] @ load half-word Assume r0 = 0x8000 LDRSH r11, [r0] @load signed WORD

Load & store instructions using 16-bit data (half words) or signed byte data do not use barrel shifter. There are NO STRSB or STRSH instructions since STRH stores both signed as well as unsigned halfword; similarly STRB stores both signed as well as unsigned byte.

Multiple-Register Transfer

Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction. The transfer occurs from a base address register Rn pointing into memory. Multiple-register transfer instructions are more efficient from single-register transfers for moving blocks of data around memory and saving and restoring context and stacks. Load-store multiple instructions can increase interrupt latency.


On ARM7, a load multiple instruction takes 2+Nt cycles, where N is the number of registers to load and t is the number of cycles required for each sequential access to memory. If an interrupt has been raised, then it has no affect until load-store multiple instruction is complete. Any subset of the current bank of registers can be transferred to memory or fetched from memory. The base register Rn determines the source or destination address for a load-store multiple instruction. This register can be optionally updated following the transfer. This occurs when register Rn is followed by the ! Character. List of registers to be loaded is provided within the curly brackets {...} as a comma separated list or range list using '-'.


mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010

r1 = 0x00000000
r2 = 0x00000000

r3 = 0x00000000
LDMIA r0!, {r1-r3} POST r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002

r3 = 0x00000003

When LDMIB is used, the first word pointed to by register r0 is ignored and register r1 is loaded from the next memory location as shown above. After execution, register r0 now points to the last loaded memory location.

Decrement Versions

The decrement versions DA and DB of the loadstore multiple instructions decrement the start address and then store to ascending memory locations. This is equivalent to descending memory but accessing the register list in reverse order

If you use a store with base update, then the paired load instruction of the same number of registers will reload the data and restore the base address pointer. This is useful when you need to temporarily save a group of registers and restore them later.

STM increment before instruction followed by an LDM decrement after instruction. The STMIB instruction stores the values 7, 8, 9 to memory. We then corrupt register r1 to r3. The LDMDA reloads the original values and restores the base pointer r0.

Block Memory Copy Example

@ r9 points to start of source data @ r10 points to start of destination data @ r11 points to end of the source Loop:
LDMIA r9! {r0-r7}

STMIA r10! {r0-r7}

CMP r9, r11 BNE loop

Stack Operation

POP operation uses load multiple instruction PUSH operation uses store multiple instruction. A stack is either ascending or descending.

Ascending stack grows towards higher memory address while descending stack grows towards lower memory address.
Full Stack (F): stack pointer sp points to the last used or full location.

Empty Stack (E): sp points to the first unused empty location.

Stack Example
PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080014 STMFD sp!, {r1,r4} POST r1 = 0x00000002 r4 = 0x00000003 sp = 0x0008000c

PRE r1 = 0x00000002
r4 = 0x00000003

sp = 0x00080010
STMED sp!, {r1,r4} @empty stack

POST r1 = 0x00000002
r4 = 0x00000003 sp = 0x00080008

@push operation


Little-Endian: The least significant byte in the register will be stored to the lowest address and the most significant byte will be stored to the highest address. A 32-bit value 0x0A0B0C0D stored at memory location 0x400 looks like this:




For word-length loads & stores, endianness does not really matter. For byte or half-word values, it is important to keep in mind the endianness of data. Default format is Little-Endian

ARM Instruction Set

Data Processing Instructions Branch Instructions (Flow Control) Status Register transfer instructions (Logic/Bit bashing) Load and Store instructions (Memory Access)

Co-processor instructions (System Control)

Exception generating instructions (Priviledged)

Data Processing Instructions

Manipulate data within registers. Move instructions, arithmetic instructions, logical instructions, comparison instructions and multiply instructions. Most data processing instructions can process one of their operands using the barrel shifter. If you use the S suffix on a data processing instruction, then it updates the flags in the CPSR.

Move Instructions

It copies N into a destination register. N could be register or an immediate value.

Barrel Shifter

Data processing instructions are processed within the ALU.

The 32-bit binary pattern in one of the source registers can be shifted left or right before it enters the ALU. Some data processing instructions do not use barrel shift. For example: MUL, CLZ (count leading zeros), QADD (signed saturated 32-bit add) instructions.
Pre-processing or shift occurs within the cycle time of the instruction.

MOV r7, r5, LSL #2

Arithmetic Instructions

Implement addition and subtraction of 32-bit values.

e.g. : SUB r0, r1, r2 A wide range of second operand shift is available for arithmetic and logical instructions. ADD r0, r1, r1, LSL #1

Logical Instructions
r1 = 0b1111 r2 = 0b0101 BIC r0, r1, r2 r0 = 0b1010 r0 = 0x00000000 r1 = 0x02040608 r2 = 0x10305070 ORR r0, r1, r2 r0 = 0x12345678

Comparison Instructions

Used to compare or test a register with 32-bit value. It updates the cpsr flag bits according to the result, but do not affect other registers. After the bits have been set, the information can then be used to change program flow by using conditional execution.

Multiply Instructions

Long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce 64-bit result. The result is placed in two 32-bit registers labelled RdLo and RdHi.

Branch Instructions

A branch instruction is used to change the flow of execution or is used to call a sub-routine. This type of instruction allows programs to have subroutines, if-then-else structures and loops. The branch labels are placed at the beginning of the line and are used to mark an address that can be used later by the assembler to calculate the branch offset.


B loop test ---------------- ; evaluate condition BNE loop test

FOR Loop
for (j=0; j<10; j++) mov r1,#0 loop cmp r1, #10 ;j=0 ; is j<10 ?

bge done
-------------add r1, r1, #1 b loop done -------; j++

; if j>=10, finish

Constants and literal pools Instruction Encoding

ARM instructions are 32-bit long (Excluding thumb instructions) How do we fit a 32-bit constant into an instruction that is only 32-bit long? 32-bits are divided into a number of fields. For instance

Bits [31:28] Condition Bits [27:25] class of instruction Bits [24:21] instruction opcode LSB [11:0] operand (register, immediate value etc.)

We will look at the case where the operand is an immediate value.

Binary Encoding of Instructions

The least significant byte (LSB) (8 bits) can be any number from 0 to 255.
Bits [11:8] of instruction specify a rotate value. This rotate value is multiplied by 2 and is then used to rotate the 8-bit byte to the right

The bit pattern 0xe3a004ff translates to the mnemonic :

MOV r0, #0xFF, 8

With 12-bits available in an instruction and dedicated hardware for performing shifts, ARM processor can generate classes of numbers instead of every number between 0 to 2^32 1.

An example

Calculate the rotation necessary to generate the constant 4080 using the byte rotation scheme. Solution

4080 in Decimal --> 111111110000 in Binary

The byte 11111111 or 0xFF can be rotated to the left by 4 bits to obtain this number. This is same as rotating the byte 0xFF by 28 bits to the right. Thus, the instruction should be MOV r0, #0xFF, 28 @ r0 = 4080

.section .text _start: @ MOV r0, #16384 MOV r0, #0x4000 @should it work?? .end Disassembly of section .text: 00000000 <_start>: 0: e3a00901 mov r0, #16384 ; 0x4000

It is equivalent to : MOV r0, #0x01, 18 However, the following commands don't work with GNU-ARM-ELF assembler:

Mov r0, 0xFFF

Mov r0, 0x14000 But this statement works : Mov r0, 0x11000 with disassembly:

0: e3a00a11


r0, #69632

; 0x11000

What does it mean?

GNU-ARM-ELF assembler internally converts the given immediate value (#<number>) into a suitable byte with necessary rotation number as long as it is possible, otherwise gives error. If you are getting an error with a particular immediate value, it means the byte-rotation scheme can produce this number and hence is invalid.

Loading Constants into Registers

Use following pseudo-instruction to load constants into register: LDR <Rd> =<numeric constant>

When the assembler sees the LDR pseudoinstruction, it will try to use either a MOV or MVN instruction to perform the given load before going further (Using byte rotation scheme). For those numbers that can not be created by byte rotation scheme, a literal pool or a block of constants, is created to hold them in memory, usually very near the instructions that asked for the data. Load instruction uses PC-relative addressing and hence has a range of 4 KB (12 offset bits). Hence a large block code will cause LDR to fail.

Load/Store word or unsigned byte Immediate offset

.section .text _start: BL func1 BL func2 stop: B stop @ terminate the program @ call first subroutine @ call second subroutine

LDR r0, =42 @ => MOV r0, #42 @ => MOV r1, [PC, #N], where N = offset to literal pool 1

LDR r1, =0x55555555


MOV PC, LR .ltorg func2: LDR r3, =0x55555555 @LDR r4, =0x66666666 MOV PC, LR

@ => MVN r2, #0

@ return from subroutine @ literal pool 1 has 0x55555555

@ => LDR r3, [PC, #N], where N = offset back to literal pool 1 @ if this is uncommented, it fails as Literal pool 2 is out of reach

@ return from subroutine