Professional Documents
Culture Documents
ARM7TDMI
This is the ARM7 family processor which has T= Thumb instruction set, D= Debug Unit, M= MMU(Memory Management Unit), I= Embedded Trace core.
ARM946E
1.
ARM9xx core
2.
2.
3.
User: Unprivileged mode under which most tasks run FIQ: Entered when a high priority interrupt is raised
2.
3.
4.
5.
6.
7.
FIQ
IRQ
SVC
Undef
Abort
r8
r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr
r8
r9 r10 r11 r12 r13 (sp) r14 (lr)
spsr
spsr
spsr
spsr
spsr
The current processor mode governs which of several banks is accessible. Each mode can access
a particular set of r0-r12 registers a particular r13 (the stack pointer, sp) and r14 (the link register, lr) the program counter, r15(pc) the current program status register, cpsr
Memory
Contiguous storage elements that hold data. The address bus on the ARM7TDMI consists of 32 bits, meaning that one could address bytes in memory from address 0 to 4,294,967,295 (4GB) of memory space. RISC processors use separate Load/Store instructions for accessing memory.
The data types supported are signed and unsigned words (32 bit), half-words (16-bit) and bytes.
LDR and STR instructions can load and store data on a boundary alignment that is the same as the data type size being loaded or stored. For example: LDR can only load 32-bit word on a memory address that is a multiple of four bytes 0, 4, 8 and so on.
ARM instructions set provides different modes for addressing memory. These modes incorporate one of the indexing methods:
Preindex with writeback: Calculates an address from a base register plus address offset and then offsets that address base register with the new address. Preindex: same as the preindex with write back but does not update the address base register.
Postindex: only updates the address base register after the address is used.
Post-index and preindex with writeback modes are useful for traversing an array.
Immediate means the address is calculated using the base address register and a 12-bit offset encoded in the instruction.
Registers mean the address is calculated using the base address register and a specific register's content.
Scaled means the address is calculated using the base address register and a barrel shift operation.
Load & store instructions using 16-bit data (half words) or signed byte data do not use barrel shifter. There are NO STRSB or STRSH instructions since STRH stores both signed as well as unsigned halfword; similarly STRB stores both signed as well as unsigned byte.
Multiple-Register Transfer
Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction. The transfer occurs from a base address register Rn pointing into memory. Multiple-register transfer instructions are more efficient from single-register transfers for moving blocks of data around memory and saving and restoring context and stacks. Load-store multiple instructions can increase interrupt latency.
Contd..
On ARM7, a load multiple instruction takes 2+Nt cycles, where N is the number of registers to load and t is the number of cycles required for each sequential access to memory. If an interrupt has been raised, then it has no affect until load-store multiple instruction is complete. Any subset of the current bank of registers can be transferred to memory or fetched from memory. The base register Rn determines the source or destination address for a load-store multiple instruction. This register can be optionally updated following the transfer. This occurs when register Rn is followed by the ! Character. List of registers to be loaded is provided within the curly brackets {...} as a comma separated list or range list using '-'.
PRE
r1 = 0x00000000
r2 = 0x00000000
r3 = 0x00000000
LDMIA r0!, {r1-r3} POST r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002
r3 = 0x00000003
When LDMIB is used, the first word pointed to by register r0 is ignored and register r1 is loaded from the next memory location as shown above. After execution, register r0 now points to the last loaded memory location.
Decrement Versions
The decrement versions DA and DB of the loadstore multiple instructions decrement the start address and then store to ascending memory locations. This is equivalent to descending memory but accessing the register list in reverse order
If you use a store with base update, then the paired load instruction of the same number of registers will reload the data and restore the base address pointer. This is useful when you need to temporarily save a group of registers and restore them later.
STM increment before instruction followed by an LDM decrement after instruction. The STMIB instruction stores the values 7, 8, 9 to memory. We then corrupt register r1 to r3. The LDMDA reloads the original values and restores the base pointer r0.
Stack Operation
POP operation uses load multiple instruction PUSH operation uses store multiple instruction. A stack is either ascending or descending.
Ascending stack grows towards higher memory address while descending stack grows towards lower memory address.
Full Stack (F): stack pointer sp points to the last used or full location.
Stack Example
PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080014 STMFD sp!, {r1,r4} POST r1 = 0x00000002 r4 = 0x00000003 sp = 0x0008000c
PRE r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080010
STMED sp!, {r1,r4} @empty stack
POST r1 = 0x00000002
r4 = 0x00000003 sp = 0x00080008
@push operation
ENDIANNESS
Little-Endian: The least significant byte in the register will be stored to the lowest address and the most significant byte will be stored to the highest address. A 32-bit value 0x0A0B0C0D stored at memory location 0x400 looks like this:
0x400----0D
0X401---0C
0X402---0B
0X403---0A
Contd...
For word-length loads & stores, endianness does not really matter. For byte or half-word values, it is important to keep in mind the endianness of data. Default format is Little-Endian
Data Processing Instructions Branch Instructions (Flow Control) Status Register transfer instructions (Logic/Bit bashing) Load and Store instructions (Memory Access)
Manipulate data within registers. Move instructions, arithmetic instructions, logical instructions, comparison instructions and multiply instructions. Most data processing instructions can process one of their operands using the barrel shifter. If you use the S suffix on a data processing instruction, then it updates the flags in the CPSR.
Move Instructions
Barrel Shifter
The 32-bit binary pattern in one of the source registers can be shifted left or right before it enters the ALU. Some data processing instructions do not use barrel shift. For example: MUL, CLZ (count leading zeros), QADD (signed saturated 32-bit add) instructions.
Pre-processing or shift occurs within the cycle time of the instruction.
Arithmetic Instructions
Logical Instructions
r1 = 0b1111 r2 = 0b0101 BIC r0, r1, r2 r0 = 0b1010 r0 = 0x00000000 r1 = 0x02040608 r2 = 0x10305070 ORR r0, r1, r2 r0 = 0x12345678
Comparison Instructions
Used to compare or test a register with 32-bit value. It updates the cpsr flag bits according to the result, but do not affect other registers. After the bits have been set, the information can then be used to change program flow by using conditional execution.
Multiply Instructions
Long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce 64-bit result. The result is placed in two 32-bit registers labelled RdLo and RdHi.
Branch Instructions
A branch instruction is used to change the flow of execution or is used to call a sub-routine. This type of instruction allows programs to have subroutines, if-then-else structures and loops. The branch labels are placed at the beginning of the line and are used to mark an address that can be used later by the assembler to calculate the branch offset.
Examples
WHILE Loop
B loop test ---------------- ; evaluate condition BNE loop test
FOR Loop
for (j=0; j<10; j++) mov r1,#0 loop cmp r1, #10 ;j=0 ; is j<10 ?
bge done
-------------add r1, r1, #1 b loop done -------; j++
; if j>=10, finish
ARM instructions are 32-bit long (Excluding thumb instructions) How do we fit a 32-bit constant into an instruction that is only 32-bit long? 32-bits are divided into a number of fields. For instance
Bits [31:28] Condition Bits [27:25] class of instruction Bits [24:21] instruction opcode LSB [11:0] operand (register, immediate value etc.)
The least significant byte (LSB) (8 bits) can be any number from 0 to 255.
Bits [11:8] of instruction specify a rotate value. This rotate value is multiplied by 2 and is then used to rotate the 8-bit byte to the right
With 12-bits available in an instruction and dedicated hardware for performing shifts, ARM processor can generate classes of numbers instead of every number between 0 to 2^32 1.
An example
Calculate the rotation necessary to generate the constant 4080 using the byte rotation scheme. Solution
.section .text _start: @ MOV r0, #16384 MOV r0, #0x4000 @should it work?? .end Disassembly of section .text: 00000000 <_start>: 0: e3a00901 mov r0, #16384 ; 0x4000
It is equivalent to : MOV r0, #0x01, 18 However, the following commands don't work with GNU-ARM-ELF assembler:
0: e3a00a11
mov
r0, #69632
; 0x11000
GNU-ARM-ELF assembler internally converts the given immediate value (#<number>) into a suitable byte with necessary rotation number as long as it is possible, otherwise gives error. If you are getting an error with a particular immediate value, it means the byte-rotation scheme can produce this number and hence is invalid.
Use following pseudo-instruction to load constants into register: LDR <Rd> =<numeric constant>
When the assembler sees the LDR pseudoinstruction, it will try to use either a MOV or MVN instruction to perform the given load before going further (Using byte rotation scheme). For those numbers that can not be created by byte rotation scheme, a literal pool or a block of constants, is created to hold them in memory, usually very near the instructions that asked for the data. Load instruction uses PC-relative addressing and hence has a range of 4 KB (12 offset bits). Hence a large block code will cause LDR to fail.
.section .text _start: BL func1 BL func2 stop: B stop @ terminate the program @ call first subroutine @ call second subroutine
func1:
LDR r0, =42 @ => MOV r0, #42 @ => MOV r1, [PC, #N], where N = offset to literal pool 1
@ => LDR r3, [PC, #N], where N = offset back to literal pool 1 @ if this is uncommented, it fails as Literal pool 2 is out of reach
.end