You are on page 1of 132

THE ARM7TDMI

Introduction And Architecture

12/08/21 Anirudh Radhakrishnan 1


INTRODUCTION TO ARM

12/08/21 2
History of ARM
 Acron started in 1983
 By 1985 design of first commercial RISC
machine called Acron RISC Machine (ARM).
 In 1990 there were 12 engineers and 1 CEO,
with no customers and a little money.
 In 1990’s TI incorporated ARM for mobile
phones
 By 1998 there were 13 millionaires in company.

12/08/21 3
Origin Of the Name
ARM7TDMI
 ARM –Acron Risc Machine (Now Advanced Risc Machine)
 T – The Thumb 16 bit instruction set.
 D – On chip Debug support.
 M – Enhanced Multiplier
 I – Embedded ICE hardware to give break
point and watch point support.

12/08/21 4
ARM Features
 RISC
 32 bit general purpose processor
 High performance , low power consumption and
small size
 Large , regular Register File
 load/store architecture
 Pipelining
 Uniform and fixed-length(32 bit) instruction-(ARM)
 3-address instruction
 Simple addressing modes
contd-
12/08/21 5
 Conditional execution of the instructions
 Control over both ALU and Shifter in every
data processing instruction
 Multiple load/store register instructions
 Ability to perform 1clk cycle general shift &
ALU operation in 1 instruction
 Coprocessor instruction interfacing
 THUMB architecture-(dense 16-bit compressed
instruction set)

12/08/21 6
THUMB Instruction Set (T variant)
 re-encoded subset of ARM instruction
 Half the size of ARM instructions(16 bit)
 Greater code density
 On execution 16 bit thumb transparently
decompressed to full 32 bit ARM without loss of
performance
 Has all the advantages of 32 bit core
 Low performance in time-critical code
 Doesn’t include some instruction needed for
exception handling
12/08/21 7
contd-
 40% more instructions than ARM code
 30% less external memory power than ARM code
 With 32 bit memory
-ARM code 40% faster than Thumb code
 With 16 bit memory
-Thumb code 45% faster than Arm code
 For best performance
-use 32 bit memory and ARM code
 For best cost and power efficiency
-use 16 bit memory and thumb code
 In typical embedded system
-Use ARM code in 32 bit on-chip memory for small
speed-critical routines
-Use Thumb code in 16 bit off-chip memory for large
non-critical routines
12/08/21 8
ARM state
 All instructions are 32 bit in length

 All instructions must be word aligned

 PC value stored in bits[31:2] and bits [1:0]


equal to zero

12/08/21 9
THUMB state
 Instructions 16 bit in length
 Instructions half-word aligned
 PC value stored in bits[31:1] and bit [0] equal
to zero

12/08/21 10
Introduction to RISC And CISC
 What is CISC?
 Complex Instruction Set Computers.
 Aimed at reducing the gap between instruction set
and high level language.
 These instructions perform complex sequence of
operations over many cycles.
 Large and powerful range of instruction

 Less flexible to implement

12/08/21 11
RISC
 RISC stands for Reduced Instruction Set
Computer.
 Optimizing the instruction set and improving the
speed of the processor.
 The memory access instructions are those which
make a computer slow. Arithmetic instructions
have less effect on speed of processor.
 Around 45% of CPU usage is for Data transfer.

12/08/21 12
RISC Architecture
 Fixed instruction size with few formats.
 Memory access instructions are separated from
instructions that process data.
 A large register bank of 32 registers each of
size 32 bits.

12/08/21 13
RISC Features
 Hard wired instruction decode logic.
 Pipelined instruction execution
 Large number of registers
 Register independence
 Smaller die size
 Low power
 Simpler to program
 Comparatively less expensive

12/08/21 14
Advantages Of RISC
 Reduction in the size of processor.
 High instruction throughput.
 Excellent response for interrupt.
 Efficient usage of CPU time.

12/08/21 15
Pipelines
 Usually instructions are executed in three
stages.
 Fetch
 Decode

 Execute

 Can we concurrently use the processor to


perform several operations?
 Yes, this is what is known as PIPELINING.

12/08/21 16
Pipelines
 Clearly, portion of the hardware which does
fetching job will be idle during decode and
execute phase.
 This led to idea that next instruction can be
started before the current one has finished.
 Fetch the II instruction during decoding of II
instruction, decode the II instruction during
execution of I instruction and so on.

12/08/21 17
Pipelining
 Pipe lining

FETCH III INSTRUCTION FETCHED

DECODE II INSTRUCTION DECODED

EXECUTE
I INSTRUCTION EXECUTED

12/08/21 18
Pipeline Stages
 Fetch
o instruction is fetched from memory and placed in
instruction pipeline
 Decode
o instruction is decoded
o data-path control signals prepared for next cycle
o in data transfer instructions ,ALU holds address
components to compute auto indexing modification if
required
 Execute
o register bank is read
o ALU result generated
o result written back into destination register
o in control flow instructions ,pipeline refilling is done

12/08/21 19
Processor Modes
 ARM has 7 operating modes
-User
(unprivileged mode under which most tasks run)
-Fast Interrupt Request Mode FIQ
(to handle high priority interrupt )
-Interrupt Mode IRQ
(entered when a low priority interrupt is raised )
-Supervisor Mode SVC
(entered on reset or a software interrupt )
-Abort Mode ABT
(used to handle memory access violation)
-Undefined Mode UND
(used to handle undefined instruction)
-System Mode SYS
12/08/21 (uses same registers as user mode .added at version 4)20
MODES
 Most application program run in User Mode

 A program in user mode is unable to access some


protected system resources or to change mode , other
than by causing exception

 Mode change can be by


-Software control
-External interrupts
-Exception processing

12/08/21 21
MODES
 Modes other than user mode are called privileged
modes
 Privileged modes has full access to the system
resources
 Five of them are called exception modes
-FIQ
-IRQ
-SVC
-ABT
-UND

12/08/21 22
MODES
 Processor enters into Privileged modes under
specific exception condition
 All the exception Modes uses some additional
registers ,to avoid corrupting the user state
when exception occurs
 SYS uses the same no: of registers as the User
Mode

12/08/21 23
DATA TYPES

 Byte ( 8 bit ) : placed on any byte boundary

 Half-Word ( 16 bit) : aligned to 2 byte


boundaries

 Word ( 32 bit ) : aligned to 4 byte boundaries

12/08/21 24
Data Types
 When any of the type is defined as unsigned ,the N bit
value represents a non-negative integer in the range 0 to
2^N-1
 when defined as signed the N bit value represents an
integer in the range -2^(N -1)to 2^(N-1)-1
 All data operations a performed on word quantities
 Load and store operations can transfer all the data types
from and to the memory ,automatically zero extending
or sign extending bytes or half-words as they are loaded

12/08/21 25
REGISTERS
 ARM has 37 32 bit long registers

• 30 general purpose registers


• 5 dedicated Saved Program Status Registers
• 1 dedicated Current Program Status Register
• 1 dedicated program counter

12/08/21 26
General Purpose Registers
 30 32 bit registers
 15 general purpose registers are visible at one
time , depending on the current processor
mode ,as r0,r1,r2 …r13,r14
 r13-conventionally used as stack pointer
 r14 –conventionally used as link register to
store the return address for exception/ sub-
routine call

12/08/21 27
Program Counter
 PC is accessed as r15
 Incremented by 4 bytes for ARM state and 2
bytes for THUMB state
 Branch instruction loads destination address
into the PC
 Can also be loaded using data operation
instruction

12/08/21 28
Program Counter
 Due to pipelining , address of currently
executing instruction is typically PC-8 for
ARM and PC-4 for THUMB
 For ARM state bits 1 & 0 are always zero or
ignored
 For THUMB state bit 0 is always zero or
ignored

12/08/21 29
CPSR - Current Program Status
 CPSR holds
Register
• Copies of ALU status flags
• The current processor mode
• Interrupt disable flag
 ALU status flags are used to determine whether
conditional instructions are executed or not
 On THUMB capable processors ,the CPSR
holds the current processor state

12/08/21 30
 Condition code flags
FLAGS
N(31) – *set to bit 31 of the result of the instruction
*N=0 if positive
*N=1 if negative

Z(30) – *Z=1 if result is zero


*Z=0 if not zero

C(29) – *for addition ,set to 1 if carry occurs & 0 otherwise


*for subtraction ,set to 0 if borrow occurs & 1
otherwise
*for shift operations , C contains the last bit shifted

V (28) – *for addition and subtraction V set to 1 if signed


overflow occurs
12/08/21 31
FLAGS
 Control bits
I(7) – *when set disables IRQ interrupt

F(6) – *when set disables FIQ interrupt

T(5) – *on T variants of v5


T=0 ,indicates ARM execution
T=1 ,indicates THUMB execution
*on non-T variants
T=0,indicates ARM execution
T=1,causes the next instruction executed
to cause UND
12/08/21 32
FLAGS
 MODE BITS (4:0)
M(4:0) Mode
10000 User
10001 FIQ
10010 IRQ
10011 Supervisor
10111 Abort
11011 UND
11111 SYS
12/08/21 33
SPSR-Saved Program Status
Register
 Used to store CPSR when an exception is
taken

 One SPRS is accessible in each of the


exception handling mode

 User Mode and System Mode doesn’t have


SPRS as they don’t handle exceptions
12/08/21 34
General Purpose Registers
 Can be divided into three groups
 Un-banked r0-r7
 Banked r8-r14

 PC r15

12/08/21 35
Un-banked Registers

 Registers r0 to r7

 Each of these registers address the same


physical registers for all the modes

 Completely general purpose registers , with no


uses implied by the architecture

12/08/21 36
Banked Registers
 Registers r8 to r14

 physical registers referred to by each of them


depends on the mode of operation

 Banked register contents are preserved across


operating mode changes

12/08/21 37
Banked Registers
 r8 to r12
* two banked physical registers each
*one for FIQ and other for all other modes
*referred to as r8_usr to r12_usr & r8_fiq to r12_fiq
 r13 & r14
*has six banked registers each
*one in USER & SYS and rest five in each exception
modes
*referred to as r13_<mode>/r14_<mode>(for exception modes)

12/08/21 38
ARM REGISTERS

12/08/21 39
ARM REGISTERS
r0 r0 r0 r0 r0 r0
r1 r1 r1 r1 r1 r1
r2 r2 r2 r2 r2 r2
r3 r3 r3 r3 r3 r3
r4 r4 r4 r4 r4 r4
r5 r5 r5 r5 r5 r5
r6 r6 r6 r6 r6 r6
r7 r7 r7 r7 r7 r7
r8 r8_fiq r8 r8 r8 r8
r9 r9_fiq r9 r9 r9 r9
r10 r10_fiq r10 r10 r10 r10
r11 r11_fiq r11 r11 r11 r11
r12 r12_fiq r12 r12 r12 r12
r13 r13_fiq r13_svc r13_abt r13_irq r13_und
r14 r14_fiq r14_svc r14_abt r14_irq r14_und
r15(PC) r15(PC) r15(PC) r15(PC) r15(PC) r15(PC)

CPSR CPSR CPSR CPSR CPSR CPSR


SPSR_fiq SPSR_svc SPSR_abt SPSR_irq SPSR_und

12/08/21 40
Thumb State Register Set
 Is a subset of ARM set
 The programmer has access to
 8 general register r0 to r7
 PC

 SP

 LR

 CPSR

 SPSR( for exception modes)

12/08/21 41
Mapping of Thumb State registers to
ARM State registers

r0 r0
r1 r1
r2 r2
r3 r3
r4 r4
r5 r5
r6 r6
r7 r7
r8
r9
r10
r11
r12
SP r13
LR r14
PC r15
CPSR CPSR
SPSR SPSR
12/08/21 42
Exceptions & Interrupts
 By default the system is in User Mode

 Enters exceptions modes when unexpected events occur

 There are 3 diff types of exceptions (some are called interrupts)

1>as a direct result of executing an instruction


*software interrupt request (SWI)
*undefined illegal instruction
*memory error during fetching an instruction
2>side-effects of an instruction
*memory error during read/write from memory
*arithmetic error
3>result of external hardware signals
*reset
*fast interrupt
*normal interrupt
contd-

12/08/21 43
Exceptions & Interrupt

 As the processor enters an exception mode


,some new registers are automatically switched
in depending on the type of mode

 This ensures that task state is not corrupted by


occurrence of an exception

12/08/21 44
What happens when exception occurs
 ARM completes the current instruction as
best as it can
 Departs from current instruction to handle
the exception through following steps
1) saves the current PC in r14 corresponding to the
new mode
2) saves CPSR in corresponding SPRS of new mode
3) changes the operating mode corresponding to an
exception

contd-

12/08/21 45
4) disables exceptions of lower priority

5) forces PC to a new value corresponding to


exception. Effectively force jumps the
instruction stream to Exception Handler or
Interrupt Service Routines.
* a unique address is predefined for each
exception handler
* address to which the processor is forced to
branch is called exception/interrupt vector

12/08/21 46
Exception/Interrupt Vector
 Each vector (except FIQ) is 4 bytes long
 Branch instruction is put at this address
Exception type Mode Vector add: High Vector add:

Reset Supervisor 0x00000000 0xFFFF0000


Undefined Instruction Undefined 0x00000004 0xFFFF0004
Software Interrupt Supervisor 0x00000008 0xFFFF0008
Pre-fetch Abort Abort 0x0000000C 0xFFFF000C
Data Abort Abort 0x00000010 0xFFFF0010
IRQ (interrupt) IRQ 0x00000018 0xFFFF0018
FIQ (fast interrupt) FIQ 0x0000001C 0xFFFF001C

12/08/21 47
Exception Return
 Once the exception has been handled (by the exception
handler) ,the user task is resumed.
 The handler program (or Interrupt Service Routine) must
restore the user state exactly as it was before the exception
occurred:
1. Any modified user registers must be restored from the handler
stack
2. The CPSR must be restored from the appropriate SPSR
3. PC must be changed back to the instruction address in the user
instruction stream

 Steps 1 and 3 are done by user, step 2 by the processor


 Restoring registers from the stack would be the same as in the
case of subroutines
 Restoring PC value is more complicated. The exact way to do
it depends on which exception you are returning from.

12/08/21 48
Exception Return
 We assume that the return address was saved in r14 before
entering the exception handler.
1)To return from a SWI or undefined instruction trap, use:
MOVS pc, r14

2)To return from an IRQ, FIQ or pre-fetch abort, use.


SUBS pc, r14, #4

3)To return from a data abort to retry the data access, use:
SUBS pc, r14, #8
 Three methods are because PC value can be 1 or 2 instructions
ahead due to pipelining

12/08/21 49
Interrupt Priority
 Since exceptions can arise at the same time, a priority
order has to be clearly defined. For the ARM
processor this is:
1) Reset (highest priority)
2) Data abort (i.e. Memory fault in read/write data)
3) Fast Interrupt Request (FIQ)
4) Normal Interrupt Request (IRQ)
5) Pre-fetch abort
6) Software Interrupt (SWI), undefined instruction
12/08/21 50
ARM7TDMI Core

12/08/21 51
Internal organization of ARM
 Two main blocks: data-path and decoder
 Register bank (r0 to r15)
* Two read ports to A-bus/B-bus
*One write port from ALU-bus
*Additional read/write ports for program counter r15
 Barrel shifter - shift/rotate 2nd operand by any number of bits
 ALU performs arithmetic/logic functions
 Address registers/incrementer holds either PC address (with
increment) or operand address

12/08/21 52
Internal organization of ARM
 Data register holds read/write data from/to memory
 Instruction decoder decodes machine code
instructions to produce control signals to data-path
 In single-cycle data processing instructions, data
values are read on the A-bus & B-bus, the results
from ALU is written back into register bank
 PC value in address register is incremented and
copied back to r15 and the address register – this
allows fetching new instructions ahead of time
(instruction pre-fetch)
 In case of branching ,next pre-fetch address is taken
from ALU rather than the address incrementer .The
instruction pipeline is filled before any further
execution takes place.
12/08/21 53
Data-path activity during data processing
instruction
SUB r0, r1, #128 LSL #3 ; r0 := r1 - 128*8

 Subtract instruction – one


operand is a constant
 Constant 128 encoded in
instruction passes through
barrel shifter to produce 128*8
 ALU operates on the operands
and writes
the result back to register r0
 PC value in address register is
incremented and coped back to
r15 and the address register

12/08/21 54
Memory-Address Space
 ARM uses single flat address space of 2^32 bytes
 Byte address are treated as unsigned ,running from 0
to 2^31-1
 The address space is regarded as consisting of 2^30
32 bit words,each of whose addresses is word aligned
 Word ,whose word aligned address is ‘A’ ,consists of
four bytes with address A , A+1 , A+2 , A+3
 From v4 and above address space is also considered
as 2^31 16-bit halfwords

12/08/21 55
Endianness
 Memory system uses one of the 2 mapping schemes to
map between word ,half-word & byte
1) little-endian system:
~a byte or halfword at a word-aligned address is the least
significant byte or halfword within the word at that address
~a byte at a halfword-aligned address is least significant byte within
the half word at that address

Word at address A
Halfword at add: A+1 Halfword at add: A
Byte at A+3 Byte at A+2 Byte at A+1 Byte at A

12/08/21 56
2) big-endian system:
~a byte or halfword at a word aligned address is the most
significant byte or halfword within the word at that address
~a byte at a halfword aligned address is most significant byte
within the half word at that address

Word at address A
Halfword at add: A Halfword at add: A+1
Byte at A Byte at A+1 Byte at A+2 Byte at A+3

 ARM instruction set doesn’t contain any instruction that can


directly select the endianness .Instead a hardware input is used
to configure an ARM implementation to the memory system

12/08/21 57
Memory mapped I/O
 Standard way to perform I/O functions on
ARM systems is by the use of memory
mapped I/O
 This uses special memory addresses which
supply I/O functions when they are loaded
from or stored to
 Loading from memory mapped I/O address is
used for input ,and storing to memory mapped
I/O address is for output

12/08/21 58
Instruction fetches from memory mapped I/O

 Behavior of memory mapped I/O usually vary


from that expected of a normal memory
location
 Fro eg: ,two successive loads from same
location may not yield the same result ,as
expected from a normal memory.
 As a result ,it is recommended that memory
mapped I/O not be used for instruction fetch

12/08/21 59
Data access to memory mapped I/O
 If memory words ,halfwords or bytes accessed by the
code sequence are memory mapped I/O locations, one
access can generate a side effect which changes the
results of a subsequent access to a different location
 If this happens the time order of individual accesses
makes a difference to the final result of the code
sequence
 It is also important that data size of the memory
access be maintained ,when accessing memory
mapped I/O
 For eg: a code sequence that specifies four byte reads
from four subsequent address must not be merged
into a single word read
12/08/21 60
Data access to memory mapped I/O
 Typical requirements includes
• Constraints on memory attributes of the memory mapped
I/O .For eg: ,in the standard memory system architecture
,memory locations must be uncachable and unbufferable
• Constraints on the sizes or alignments of the access to the
memory mapped I/O locations. For eg: if an ARM
implementation has a 16-bit external bus ,it might the use of
32-bit access to the memory mapped I/O locations since
they cant be performed in a single bus cycle
• A requirement for additional hardware .For eg: ,an
alternative possibility for an ARM implementation with a
16 bit external bus is to allow 32-bit access to memory ,but
require external hardware to reassemble the two 16-bit
accesses into a single 32-bit access to the I/O device.

12/08/21 61
ARM 7 ARM9
 Core has Von Neumann  Core has Harvard
architecture ,with single architecture, with separate
32bit data bus carrying both buses for data and
instruction and data instruction
 CPI =1.9
 Uses 3 stage pipeline  CPI=1.5
 Fetch  Uses 5 stage pipeline
 Decode  Instruction fetch
 Execute  Instruction decode
 Execute
 Data memory access
 Implements BASE UPDATED  Register write
DATA ABORT MODEL  Implements BASE RESTORED
 Doesn’t implement DATA ABORT MODEL
extension spaces as  Implements all the
UNDEFINED
instruction set extension
spaces as UNDEFINED

12/08/21 62
ARM ASSEMBLY LANGUAGE
PROGRAMING

An Introduction to Instruction Set.

12/08/21 SHANKAR NARAYAN P.S 63


ARM Instruction Types
 32 bit ARM Instruction set.
 16 bit Thumb instruction set.
 Can be further divided in to following types.
 Data processing instructions.
 Data transfer instructions.

 Control flow instructions.

 Coprocessor instructions.

 Breakpoint instructions.

12/08/21 64
ARM Instruction format
 U – Up the stack.
 S – Set condition code bit. This says whether
the data processing instruction should affect
the flags or not.
 W – write back.
 L – Load/Store.
 N – Data size.

12/08/21 65
12/08/21 66
ARM Instruction format
 Rn, Rs, Rm – Used for sourse registers.
 Rd – Destination registers.
 RdHi – Most significant 32 bits of destination
register.
 RdLo – Least significant 32 bits of destination
register.

12/08/21 67
About the condition field.
 Ordinary instruction set allow branches to be
executed conditionally.
 Arm instructions contain a condition field
within itself which determines whether the cpu
is going to execute them or not.
 The time penalty of not executing several
conditional instructions is usually less than the
overhead of branch that would be otherwise
needed. The branch instructions usually stall the
pipeline which is removed (3 cycles to refill).
12/08/21 68
Condition Field
 The Last 4 bits of the opcode constitute the
condition field. They represent the following.

12/08/21 69
Condition Field

12/08/21 70
Data Processing Instructions
 Contains
 Arithmetic operations
 Comparisons
 Logical operations
 Data Movement between Registers.

 Important thing to note is that these


instructions cannot work on memory they
work only on Registers since ARM
incorporates LOAD/STORE Architecture.

12/08/21 71
Arithmetic Operations
 Syntax
 <operation>{<cond>}{<S>} Rd, Rn, operand2
 Operations are
 ADD – operand1 + operand2
 ADC – operand1 + operand2 + carry
 SUB – operand1 – operand2
 SBC – operand1 – operand2 + carry – 1
 RSB – operand2 – operand1
 RSC – operand2 – operand1 + carry – 1
 Reverse subtraction is required because
operand1 is always a register
12/08/21 72
With Immediate Operands
 Syntax
 <operation>{<cond>}{S} Rd, Rn, #immediate val
 Operations are
 ADD – operand1 + immediate value
 ADC – operand1 + immediate value + carry
 SUB – operand1 – immediate value
 SBC – operand1 – immediate value + carry – 1

 Note : Only 12 bits are available to store the


immediate operand.
12/08/21 73
Immediate Operands
 Then how do we put a 32 bit immediate
operand ?
 The most important thing to be taken care while
writing the 32 bit immediate operand is that it
should be a Legitimate one.
 What are these legitimate immediate values?
 Any 32 bit or lesser value which can be expressed as
an 8 bit value and a four bit shift.
 This shift value is multiplied by 2 before actually
performing the shift.
12/08/21 74
Immediate Operands
 Example MOV r0, #4096
 Uses 0x40 as 8 bit operand and shifts RIGHT by 26.
 Before storing this 26 is stored as 26/2 = 13 = 0xD.

 The instruction as MOV r0, #4096 is stored as,

20 bits for opcode and Register 0x40D

 So the operand specified must have a property


that it can be expressed as
 <8 bit val> rotated right by an EVEN amount.

12/08/21 75
Immediate Operands
 So the values that cannot be generated this way
will cause an error
 Let us see this example
 ADD r1, r2, #0xff0000 ( Note that the value is Hex).
 Uses 0xff ROR 16

 So processor stores it as,

20 bits for opcode and Register 0xff8

12/08/21 76
Logical Operations
 Syntax
 <operation>{<cond>}{<S>} Rd, Rn, operand2
 Operations are
 AND – operand1 AND operand2
 EOR – operand1 EOR operand2

 ORR – operand1 OR operand2

 BIC – operand1 AND NOT operand2 [ can be Bit clear]

12/08/21 77
Comparisons
 Only effect is to update the condition flags thus
no need to set S bit. They don’t write the result.
 Syntax
 <operation>{<cond>}Rn, operand2
 Operations are
 CMP – operand1 – operand2
 CMN – operand1 + operand2

 TST – operand1 AND operand2

 TEQ – operand1 EOR operand2

12/08/21 78
Branch Instructions
 Type1 – Branch to a label
 Syntax – B{<cond>} Label
 Offset for the branch is calculated by the
assembler in following way once a branch
instruction is encountered.
 Offset = addr of branch inst – [target addr – 8]
 -8 is to account for the pipeline which PC handles.
 The offset can be up to 26 bits. This offset is
always obtained with bottom 2 bits 0. Thus 26 bit
offset is right shifted by 2 and stored in instruction
encoding.
 The Range is ± 32 MB.
12/08/21 79
Branch Instructions
 Type2 – Branch to a subroutine (Called
Branch with Link)
 Syntax – BL{<cond>} Sub_routine_label.
 BL is identified using the link bit in the opcode.
 Implements a subroutine call by writing PC –
4 to Link Register (lr) of the current bank.
 Note that this will put the address of the next
instruction following the branch, since PC is ahead
by 3 instructions.
 To return from the subroutine we simply need
to restore PC from LR. (Mov pc, lr)
12/08/21 80
The Barrel Shifter
 ARM does not support independent shift
instructions. Instead it supports a Barrel
Shifter which can provide shifts as a part of
other instructions.
 Barrel Shifter supports several actions like
 Left shift
 Right shifts

 Rotations

12/08/21 81
Barrel Shifter – Left Shift
 Shifts the II operand in the instruction to its left,
by specified amount.
 Syntax
 <data processing instruction>, LSL <immediate No.>
 <data processing instruction>, LSL <register>

 Example
 ADD r0, r1, r1, LSL #2
 Means r0 = r1 + r1*4

12/08/21 82
Barrel Shifter Right Shifts
 Logical shift right (LSR)
 Divides by power of two
 Used with other instructions as in case of LSL

 Divides by power of two

 Example
 LSR #5 is dividing by 32
 Syntax similar to left shift.

12/08/21 83
Barrel Shifter Right Shifts
 Arithmetic shift right (ASR).

 Shifts right, and preserves the sign bit for 2’s


complement operations.
 i.e. it copies the last bit to the remaining bits.

 Used with other instructions as in case of LSR

12/08/21 84
Barrel Shifter - Rotations
 Rotate Right (ROR).
 Similar to LSR but the bits which leaves the LSB
of the register appear as the MSB of the register.
 The bit which leaves the LSB is also copied to the
CF
 Used with other instructions similar to the shift
instructions.
 Rotate Right Extended (RRX).
 Similar to rotate right but uses the carry as the 33rd
bit.

12/08/21 85
Barrel Shifter
 LSL
CF REGISTER 0

 LSR
0 REGISTER CF

12/08/21 86
Barrel Shifter
 ASR

REGISTER CF

 ROR

REGISTER CF

12/08/21 87
Barrel Shifter
 RRX

REGISTER CF

(Rotate Right through Carry)

12/08/21 88
Multiplication Instructions
 There are two multiplication instructions.
 Multiply
 Syntax : MUL {<cond>}{S} Rd, Rm, Rs
 Rd = Rm * Rs
 Multiply and accumulate
 Does addition along with multiplication with the third
register operand specified, and stores the end result in the
destination.
 Syntax : MLA{<cond>}{S} Rd, Rm, Rs, Rn
 Rd = (Rm * Rs) + Rn

12/08/21 89
Limitations of MUL
 Rd and Rm cannot be the same register.
 Cannot use program counter.
 Operands can be considered signed or
unsigned, the user should interpret correctly

12/08/21 90
Data Movement
 The MOV instruction
 Syntax : MOV{<cond>}{S} Rd, operand2
 Moves operand 2 into destination register.

 Note that there is no use of operand1 which means


that there can be an immediate data.
 The MVN instruction
 Syntax same as the MOV instruction
 Moves NOT operand2 into destination register.

12/08/21 91
Load Store Instructions
 The ARM is a load store architecture
 Does not support memory to memory data processing
 Must move to registers before using them.
 Process becomes much faster due to register access to
process data.
 There are three sets of instructions which can
interact with main memory, they are
 Single register data transfer
 Block data transfer
 Single data swap

12/08/21 92
Single Register Data Transfer
 Load or store a word (LDR/STR)
 Syntax : <LDR/STR>{<cond>} Rd/Rs, <address>
 For LDR Rd and for STR Rs as a case may be

 <address> of the memory can be expressed in many


addressing modes which will be discussed shortly.
 Load or store a byte
 Syntax : <LDR/STR>{<cond>}{B} Rd/Rs,<address>
 Note that B is to be attached after condition if any.

12/08/21 93
Single Register Data Transfer

 ARM architecture version 4 also adds support for


half words.
 Syntax : <LDR/STR>{<cond>}{H} Rd/Rs,<address>

12/08/21 94
Addressing Modes
 Register Indirect addressing mode
 Address of the source Memory location (for Load)
or the destination memory location (for Store) as a
case may be, is given by the contents of an internal
register.
 Examples:
 STR ro, [r1]
 LDR r2, [r1]

12/08/21 95
Addressing Modes
 STR r0, [r1] works this way.

R1 .
.

0x100 .

0x100 0x5 0x5 Ro

.
.
.
.
.

12/08/21 96
Indexed Addressing
 Instructions in ARM are capable of accessing a
location offset from the base address specified.
 This offset can be
 An unsigned 12 bit immediate value.
 A register, optionally shifted using barrel shift.

 Added or subtracted from the base register.

 Applied before transfer : Pre – indexed Addressing.

 Applied after transfer : Post – indexed Addressing.

12/08/21 97
Pre – Indexed Addressing
 Example : LDR r0,[r1, #12]
 Offset addition [r1] + 12 is done before transfer
 Transfer to r0 is made from the newly available
address.
 By default the base register r1 is not updated.

 To update the base register, use LDR r0, [r1,#12]!

 LDR r0, [r1,r3] can be used if r3 contains 12.

 LDR r0, [r1,r3,LSL #2] can be used if r3 contains 3.

12/08/21 98
Pre – Indexed Addressing
 How LDR r0,[r1, #12] works ?
.

R1
.
.

0x100
0x10C 0x5 0x5 Ro
+
.
.
.
.
12 .

offset R1
12/08/21 99
Post – Indexed Addressing
 Example : LDR r0,[r1], #12
 Offset addition [r1] + 12 is done after transfer
 Transfer to r0 is made from the current address.

 By default the base register r1 is updated.

 Makes sense only when there is updating.

 LDR r0, [r1], #-12 can be used to go to 0xf4.

 LDR r0, [r1,r3,LSL #2] can be used if r3 contains 3.

12/08/21 100
Post – Indexed Addressing
 How LDR r0,[r1], #12 works?
.

R1
.
.

0x100
0x10 0x5 0x5 Ro
0
+
.
.
.
.
12 .

offset
R1
12/08/21
0x10C 101
Block Data Transfer
 The Load or Store Multiple instructions allow us to
transfer data from or into registers b/w 1 and 16.
 Transferred registers can be either
 Subset of current bank
 Any subset of user mode registers when in a privileged
mode.
 They are very efficient for saving and restoring context.

 Moving large blocks of data around memory

12/08/21 102
Block Data Transfer
 These are few instructions used.
 LDMIA/STMIA : (LDMultiple/STMultiple)
Increment After (Load/Store).
 Examples:
 LDMIA r0,{r2 –r9}
 Means
 Load registers r2 to r9 with data present in 8
successive locations whose I address (Base
address) is in r0. Increment r0 after load. ro is
not updated
 Arm supports many of these kind which will
be listed out shortly.
12/08/21 103
Stacks
 Stack is an area of the memory which works
on the LIFO algorithm.
 Two pointers define the current limits of the
stack.
 The base pointer which points to the bottom of the
stack
 The stack pointer which points to current “TOP”
of the stack.

12/08/21 104
Stacks
 PUSH = I Decrement and then push.
 POP = I POP and then Increment.

SP 3
2
1
SP

INITIAL After PUSH {1,2,3}


12/08/21 105
Stacks
 The POP operation

SP 3
2 SP 2
1 1

After POP 3
INITIAL

12/08/21 106
Stacks
 Usual procedure is that the stack grows in size as
what is already seen. ARM readily supports this.
 In addition to this ARM supports the following
types of stack.
 Full Descending stack
 Full Ascending stack

 Empty Descending stack

 Empty Ascending stack

12/08/21 107
Stack Examples
0x100
.
.
.
.
.
SP SP
3 .
.
2 . 3
1 0x105 2
Initial SP 0x106 1 Initial SP

FULL DESCENDING EMPTY DESCNDING

12/08/21 108
Stack Examples
0x100
.
.
.
.
.
Initial SP Initial SP
1 .
.
2 . 1
3 0x105 2
SP 0x106 3 SP

EMPTY DESCENDING FULL ASCENDING

12/08/21 109
Stacks
 The multiple Load/Store instructions can also
be used to transfer the data from or to the stack.
 Depending on the type of stack we have the
following forms.
 STMFD/LDMFD
 STMFA/LDMFA

 STMED/LDMED

 STMEA/LDMEA

12/08/21 110
Block Data Transfer
 Putting it all together we have.

12/08/21 111
Block Data Transfer
 Few tips while using block data transfer.
 The base register in the instruction can be updated
each time using ! Symbol
 Example : STMFD sp!, {r0 - r12}
 The destination register set need not be continuous
one. We can specify different registers using “,”
 Example : LDMIA r0, {r1,r4,r6}
 Example : LDMA r0, {r1, r3 - r5}

12/08/21 112
Control Flow Instructions
 The following are to be discussed.
 Branch Instructions
 Conditional branch instructions.

 Branch and link instructions

 Subroutines.

 Supervisor calls

 Jump calls.

12/08/21 113
Control Flow Instructions
 The unconditional branch
 B Label : Branch unconditionally to the specified
label.
 The conditional branch instruction.
 B<condition> Label.
 Branches to specified label depending on the
condition specified.
 Conditions are same as listed in first Table.

12/08/21 114
Control Flow Instructions
 The BL instruction.
 Stores the current return address in r14 (Link
register) and then shifts the control to the subroutine
as already seen.
 If there is a call to a subroutine within another
subroutine then the original address is pushed on to
the stack and current return address is stored in r14.

12/08/21 115
Supervisor calls
 Whenever there is a need for input and an
output then it has to be done using supervisor
calls, which calls special subroutines using a
special interrupt called “SWI” which stands
for the “Software Interrupt”.
 Some useful SWI.
 SWI SWI_writeC
 SWI SWI_Exit

12/08/21 116
Jump Tables
 The Idea of a jump table is that a programmer
sometime wants to call one among a set of
subroutines depending on a value computed.
 Example
 BL jumptable
…………..
Jumptable
CMP r0,#0
BEQ fun1
CMP r0,#1
BEQ fun2
12/08/21 117
Swap
 The ARM instruction set has two swap
instructions.
 Swap (SWP)
 Swap Byte (SWPB)

 Syntax : SWP{<cond>}{B} Rd, Rm, [Rn]


 Example :
 SWP r12, r10, [r9] means
 Load r12 from address r9 and store r10 to address r9

12/08/21 118
Swap
 If we use the same instruction on a single
register as shown then swap is achieved
 SWP r1, r1, [r2]
 Exchanges value in r1 and memory whose address
is in r2
 Byte exchange works on similar lines.

12/08/21 119
The Thumb Instruction Set

The 16 bit instruction set

12/08/21 120
Thumb Instruction Set
 It is a re-encoded subset of ARM instruction set.
 Designed to increase the performance of the
ARM implementations, which use a 16 bit or
narrower memory data bus and allow better
code density than ARM.
 Thumb execution is flagged by T Bit (bit[5]) in
the CPSR.
 T==0 ARM mode.
 T==1 Thumb Mode.

12/08/21 121
A Glance at CPSR
 The CPSR holds
 Copies of ALU status flags.
 Current processor state.

 Interrupt disable flags.

N Z C V Unused I F T Mode

31 30 29 28 27 8 7 6 5 4 0

12/08/21 122
Entering Thumb State
 Thumb execution is entered by executing an
ARM BX instruction (Branch and Exchange)
 This instruction branches to the address held in
a general purpose register and if the bit[0] of
that register is a 1 Thumb execution begins at
the branch target address.
 If bit[0] is a 0 ARM execution continues from
a branch target address.

12/08/21 123
Thumb Model
 Thumb instruction set gives full access to the
eight ‘Lo’ general purpose registers r0 to r7
and makes use of the rest as follows.
 r13 is used as stack pointer.
 r14 is the link register.

 r15 is used as PC.

12/08/21 124
Thumb ARM Differences
 Most Thumb instructions are executed un-
conditionally where as condition can be fixed
to all the arm instructions.
 Many data processing instructions are in the
two address format, i.e. one of the source
register also acts as the destination register.
 Better code density than arm.

12/08/21 125
Data Processing Instructions
 Data processing instructions on the Lo
registers.
 i.e. registers r0 to r7.

12/08/21 126
The Data Processing Instruction

12/08/21 127
The Compares

12/08/21 128
Logical Instructions

12/08/21 129
Load Instructions

12/08/21 130
Store Instructions.

12/08/21 131
Store Instructions

12/08/21 132

You might also like