You are on page 1of 54

ARM: An Advanced

Microcontroller

Santanu Chattopadhyay
Dept. of Electronics & Elec. Comm. Engg.
Indian Institute of Technology, Kharagpur
India – 721 302
Email: santanu@ece.iitkgp.ernet.in
Microcontrollers
 Single-chip computers
 Relatively simple CPU with timers, serial /
parallel, digital / analog input / output lines
 On-chip program memory
 A small on-chip read-write memory (scratch pad)
 Extended memory interfacing
 Targeted towards small applications
 Operating frequency as low as 32KHz, though
many high speed microcontrollers available
Features sought after
 Highest available speed should be sufficient for
the application
 Size of the chip – 40 pin DIP, QFP (quad flat
package)
 Amount of on-chip ROM/RAM
 Cost of single chip
 Availability of development platform
 On-chip debugging
 Availability in the market
Example microcontrollers

 68HC11, 8051, ARM


 Atmel – AVR8, AVR32
 Freescale – CF, S08
 Hitachi – H8, SuperH
 MIPS, NEC, PIC, PowerPC
 TI MSP 430
 Toshiba TLCS-870
 Zilog – eZ78, eZ80
Introduction to ARM

 32-bit RISC architecture


 Developed by ARM Corporation, previously
known as Acron RISC Machine
 Licensed to companies that want to
manufacture ARM based CPUs or SOC
products
 Helps the licensee to develop their own -
processors compliant with ARM instruction set
architecture
Features that make ARM the most
popular embedded architecture
 ARM cores are very simple, require relatively lesser number of
transistors, leaving enough space on die to realize other
functionalities on the silicon
 Instruction set architecture and the pipeline design aimed at
minimizing energy consumption
 Also capable of running 16-bit THUMB instruction set – greater
code density and enhanced power saving
 Higher performance
 Highly modular architecture – the only mandatory part is the
integer pipeline, all other components are optional
 Built-in JTAG debug port and on-chip embedded in-circuit
emulator (ICE) that allows programs to be downloaded and fully
debugged in-system
ARM Architecture History
Version Year Features Implementation

V1 1985 The first commercial RISC (26-bit) ARM1

V2 1987 Coprocessor support ARM2, ARM3


V3 1992 32-bit, MMU, 64-bit MAC ARM6, ARM7
V4 1996 THUMB ARM7TDMI, ARM8,
ARM9TDMI, StrongARM

V5 1999 DSP and Jazelle extensions ARM10, XScale

V6 2001 SIMD, THUMB-2, TrustZone, ARM11, ARM11 MPCore


Multiprocessing

V7 2005 Three profiles, NEON, VFP ?


ARM7 block diagram
ARM Pipeline

 A very important feature of ARM processors.


It has different versions
 3-stage pipeline – ARM7TDMI and earlier
 5-stage pipeline – ARMS, ARM9TDMI
 6-stage pipeline – ARM10TDMI
 8-stage pipeline – ARM11
3-stage Pipeline

 Classical fetch-decode-execute pipeline


 First stage reads an instruction from memory
and increments the value in the instruction
address register
 Next stage decodes instruction and prepares
control signals to execute it
 Third stage does the actual work: reading
operands from register file, performing ALU
operations, writes back the modified register
values
5-stage Pipeline
 In 3-stage pipeline, pipeline stall caused by every data
transfer instruction – the next instruction cannot be
fetched while memory is being read/written
 Instruction and data memory separated
 Register read step moved to decode stage
 Execute stage split into three – performing arithmetic
computations, memory access, write result back to register
file
 Balances pipeline, reducing CPI (average number of clocks
per instruction)
 However, need to forward data between pipeline stages to
resolve data dependencies between the stages without
stalling the pipeline
6-stage Pipeline

 In ARM10 core, instruction decode is split into


two pipeline stages – decode, register
 Decode stage performs decode operation
 Register stage reads the register to be used
 A separate adder introduced in execution unit
to take care of multiply-accumulate
instructions
 Both instruction and data buses are 64-bit
8-stage Pipeline

 Two new features introduced in ARM11 core


 Shift operation has been separated into a
separate pipeline stage
 Both instruction and data accesses are
distributed across two pipeline stages
 Execution unit is split into three different
pipelines that can concurrently operate and
commit instructions out-of-order also
Instruction Set Architecture

 Typical RISC architecture with several enhancements


to improve performance further
 The RISC features are as follows
 Large uniform register file with 16 general purpose registers
 Load/store architecture. The instructions that process data
operate only on registers and are separate from instructions
that access memory
 Simple addressing modes
 Uniform and fixed-length instruction fields. All ARM
instructions are 32-bit long and most of them have a regular
three operand encoding
Improved Features
 Each instruction controls the ALU and shifter, making the
instructions more powerful
 Auto-increment and auto-decrement addressing modes
supported
 Multiple load/store instructions that allow to load/store
upto 16 registers at once
 Conditional execution of instructions introduced.
Instruction opcode is preceded by a 4-bit condition code.
For the instruction to execute, the condition must be met.
Eliminates small branches and thus pipeline stalls
 Arithmetic operations may or may not affect the status bits
Registers
 16 general purpose registers R0-R15 in user mode
 R15 is the program counter, but can also be manipulated as a
general purpose register
 R13 is conventionally used as the stack pointer. ARM instruction
set does not have PUSH/POP instructions
 R14 is called the link register. When a procedure call is made, the
return address is automatically placed into this register (unlike in
stack). A return from the procedure can be implemented by
copying R14 to R15
 Current Program Status Register (CPSR) contains four 1-bit
condition flags – negative, zero, carry, overflow
 Saved Program Status Register (SPSR) stores a copy of CPSR in
some modes of operation
Modes of Operation

 ARM processor operates in one of the six


operating modes
 User mode
 used to run application code
 CPSR cannot be written
 mode can only be changed via exception generation
 Fast interrupt processing mode (FIQ)
 Supports high speed interrupt handling
 Generally used for a single critical interrupt source
 Normal interrupt processing mode (IRQ)
 supports all other interrupt sources in a system
Modes of Operation (contd.)

 Supervisor mode (SVC)


 entered when the processor encounters a software
interrupt instruction
 used for OS services
 on reset, ARM inters into this mode
 Undefined instruction mode (UNDEF)
 fetched instruction is not an ARM instruction or a
coprocessor instruction
 Abort mode
 entered in response to memory fault
ARM Registers in Different Modes
CPSR Register

31 30 29 28 27 8 7 6 5 4 0
N Z C V Unused I F T Mode

N: negative I: IRQ mode


Z: Zero F: FRQ mode
C: Carry T: THUMB instruction set
V: Overflow Mode: 6 operating modes
Data Types

 Six different data types


 8-bit signed and unsigned
 16-bit signed and unsigned
 32-bit signed and unsigned
 Supports both little-endian and big-endian
format
 Most implementations support only little-
endian
Instruction Sets

 Two different instruction sets


 ARM : Standard 32-bit instruction set
 Data processing
 Data transfer
 Block transfer
 Branching
 Multiply
 Conditional
 Software interrupts
 THUMB : 16-bit compressed form
 Code density better than most CISC
 Dynamic decompression in pipeline
Data Processing Instructions

 Supports range of arithmetic operations – addition,


subtraction, multiplication
 Bit-wise logical operations
 All operations take 2, 32-bit operands and return a 32-
bit value
 First operand and result must be register
 Second operand can be a register or an immediate
value
 If second operand is a register, it can be shifted or
rotated before sending to the ALU
Immediate Second Operand

 Immediate operand must be 32-bit value


 All 32-bit constants cannot be specified
 All binary ones must fall within a group of 8 adjacent
bit positions on a 2-bit boundary
 More formally, a valid immediate operand n must
satisfy,
n = i ROR (2 * r)
where i is a number between 0 and 255, r is between 0
and 15
 Example: 255 (i = 255, r = 0), 256 (i = 1, r = 12)
Data Processing Instructions (contd.)

 Modification of condition flags by arithmetic


instructions is optional
 Flags need not be checked right after the
instruction that sets them
 Examples:
 ADD R1, R2, R3; R1 = R2 + R3
 ADD R1, R2, R3, LSL #2; R1 = R2 + R3*4
 ADDS R1, R2, R3, LSL #2; R1 = R2 + R3*4 and set
condition code flags
Single Register Data Transfer
Instructions
 Can be used to transfer 1, 2, or 4-bytes of data
between a register and a memory location
 Base plus offset mode can be used
 Both pre-indexed and post-indexed modes are
available
 Offset can either be a 12-bit unsigned immediate value
or a register optionally shifted by an immediate value
 Offset may be added or subtracted from the base
register
Examples

LDR R0, [R8]; R0 = Memory[R8]


LDR R0, [R1, -R2]; R0 = Memory[R1 – R2]
LDR R0, [R1, #4]; R0 = Memory[R1 + 4]
LDR R0, [R1, #4]!; R0 = Memory[R1 + 4]
R1 = R1 + 4
LDR R0, [R1], #16; R0 = Memory[R1]
R1 = R1 + 16
Various Load/Store Instructions

LDR Load word STR Store word


LDRH Load half word STRH Store half word
LDRSH Load signed half STRSH Store signed half
word word
LDRB Load byte STRB Store byte
LDRSB Load signed byte STRSB Store signed byte
Little-endian vs. Big-endian
Block Data Transfer
 Load and Store multiple instructions (LDM/STM) allow
between 1 and 16 registers to be transferred to or from
memory
 Transferred registers can be
 Any subset of current bank of registers (default)

 Any subset of user mode bank of registers when in


privileged mode
 Can use base register, auto-increment and decrement
 Lowest register number is always transferred to/from
lowest memory location accessed
 Can be utilized in
 Implementing stack

 Moving large blocks of data around memory


Stack Implementation

 Descending or ascending stack


 Full (stack pointer points to the last occupied
address) or empty (stack pointer points to the
next available address)
 Various instructions:
 STMFD/LDMFD: Full descending stack
 STMFA/LDMFA: Full ascending stack
 STMED/LDMED: Empty descending stack
 STMEA/LDMEA: Empty ascending stack
Stack Example
Moving Large Data Block

 Instructions –
 STMIA/LDMIA: Increment after
 STMIB/LDMIB: Increment before
 STMDA/LDMDA: Decrement after
 STMDB/LDMDB: Decrement before
 Example –
; R12 points to start of source data
; R14 points to the end of source data
; R13 points to the start of the destination data
Loop LDMIA R12!, {R0-R11}; load 48 bytes
STMIA R13!, {R0-R11}; and store them
CMP R12, R14; check for the end
BNE Loop; and loop until done
Multiplication Instruction
 Several versions
 Integer multiplication (32-bit result)
 Long integer multiplication (64-bit result)

 Multiply accumulate instruction

 Instructions
 MUL – 32 bit multiply

 MULA - 32-bit multiply accumulate


 UMULL – 64-bit unsigned multiply

 UMLAL – 64-bit unsigned multiply accumulate

 SMULL – 64-bit signed multiply

 SMLAL – 64-bit signed multiply accumulate

 Example
 MUL R0, R1, R2; R0 = R1 * R2
 MULA R0, R1, R2, R3; R0 = R1 * R2 + R3
Multiplication Instruction (Contd.)

 Destination and source cannot be the same register


 PC (R15) cannot be used for multiplication
 Uses Booth’s algorithm
 For each pair of bits, it takes 1 cycle
 One more cycle needed to start the instruction
 Multiplication continues till source register has some
1’s left. Otherwise it early-terminates
 To multiply 18 by -1, if 18 is in source, it takes 4
cycles, whereas, if -1 is the source, it needs 17 cycles
Software Interrupt

 Forces CPU to supervisor mode


 Instruction format: SWI #n
 Causes an exception trap to the SWI hardware vector,
exception handler is called
 Exception handler analyzes the value of n to
determine the action
 Processor completely ignores n
 Used to implement system calls
 Value of n is 24-bit, allowing 224 different system calls
Conditional Execution

 ARM allows all instructions to be executed


conditionally
 Most significant 4-bits of each instruction are
reserved to hold 16 condition codes
 Instruction is executed only if the condition set
is met by the flags in CPSR
 Example:
 EQADD R0, R1, R2; R0 = R1+R2 only if zero
flag is set
Branch Instruction

 All branches are relative to the program counter


 Jump is always within a limit of ±32MB
 Conditional branches are formed by using condition
codes
 Subroutine call is also modeled as a variant of branch
instruction
 Two opcodes
 B – standard branch
 BL – branch with link, PC+4 saved in link register R14
 BL can be used for subroutine call
 Return from subroutine can be affected by copying R14 into
R15
Branch Instruction Format

31 28 27 25 24 23 0
Condition 1 0 1 L Offset
Link bit, 0 = branch, 1 = branch with link

Offset calculation:
•Compute 26-bit difference between branch instruction and target
•Right shift by 2-bits, as instruction are always word aligned, least
significant two bits are always zeros
•24-bit value stored with the instruction
•During execution, offset left shifted by 2 bits, sign extended to 32 bits,
added to PC to get the branch target
Swap Instruction
 Atomic operation in which a memory read is followed by a memory
write
 Moves byte or word between registers and memory
 Format:
 SWP Rd, Rm, [Rn]

 SWPB Rd, Rm, [Rn]

 Two cycle operation


 Content of memory location pointed to by register Rn is copied into
a temporary space
 Content of register Rm is copied into the memory location
 Content of the temporary space is copied into the register Rd
 To effect an interchange between Rd and Rm, they should be made
the same register
Swap Instruction Execution
Modifying Status Registers

 Status registers can only be modified indirectly


 MSR moves content from CPSR/SPSR to
selected general purpose register
 MRS copies the content of selected general
purpose register to CPSR/SPSR
 Can only be executed in privileged mode
THUMB Instructions
 THUMB instructions are 16-bit long
 Stored in a compressed form
 Executed unconditionally, excepting the branch instructions
 Have unlimited access to registers R0-R7 and R13-R15
 A reduced number of instructions can access the full register set
 No MSR/MRS instructions
 Maximum number of SWI calls limited to 256
 Instructions look more like a conventional processor’s instructions.
For example, PUSH and POP instructions
 On reset or exception, processor enters into ARM instruction set
mode
 To switch between instruction sets BX / BLX has to be used
ARM vs. THUMB
Differences:
 Conditional execution
Similarities:
 ARM 3-address format,
 Load-Store architecture
THUMB 2-address
 Support for 8-, 16-, and format
32-bit data types
 THUMB has less
 32-bit unsegmented regular instruction
memory format
 THUMB has explicit
shift opcodes, in ARM
only operand modifiers
THUMB Instruction Processing

16-bit
Instruction THUMB ARM
THUMB
pipeline decompressor Instruction
code
decoder
ARM decompressor
Advantages of THUMB

 Higher code density


 THUMB code, on an average, requires 30%
less space
 If memory is organized as 32-bit words, ARM
code is 40% faster than THUMB
 If memory is organized as 16-bit words,
THUMB code is 45% faster than ARM
 THUMB code uses upto 30% less power than
ARM code
 For best performance, use 32-bit memory and
ARM instruction set
 For best cost and power efficiency, use 16-bit
memory with THUMB code
 In a typical embedded system,
 ARM be used in 32-bit on-chip memory for small
speed-critical routines
 THUMB code should be used in 16-bit off-chip
memory for large non-critical control routines
Exceptions in ARM
 Can occur from various situations
 Through execution of instructions – software interrupts, undefined
instructions, memory abort
 Side-effect of instruction – data fetch failure

 Other sources like RESET, FIQ, IRQ interrupts

 Processor switches to privileged mode


 Current value of PC+4 saved in the link register
 Current CPSR copied into SPSR of the privileged mode
 IRQ interrupts are disabled
 If exception has been caused by FIQ, FIQ interrupts are also
disabled
 Program counter loaded with exception vector address
 Execution of exception starts
ARM7 Vector Table
Exception type Mode Vector address
Reset Supervisor 0x00000000
Undefined instruction Undefined 0x00000004

Software interrupt (SWI) Supervisor 0x00000008


Prefetch abort (instruction fetch Abort 0x0000000C
abort)
Data abort (data access memory Abort 0x00000010
abort)
IRQ (interrupt) IRQ 0x00000018

FIQ (fast interrupt) FIQ 0x0000001C


Issues Related to Exceptions

 Vector at location 0x0000014 is missing to


ensure a backward compatibility
 FIQ has been given the highest address
 Interrupt service routine can start from this address
directly
 Eliminates the necessity of another jump instruction
from the vector address to the ISR, thus saving time
needed to start the routine after the fast interrupt
has occurred
Finiding the maximum of a set of
numbers
EOR R1, R1, R1 ;clear R1 to store the largest
CMP R2, #0
BEQ Over ;if block is empty, done
Loop
LDR R3, [R0] ;get the data
CMP R3, R1 ;do comparison
BCC Looptest ;skip if R1 is bigger
MOV R1, R3 ;else get the larger in R1
Looptest
ADD R0, R0, #4 ;increment pointer R0
SUBS R2, R2, #1 ;decrement number of elements left
BNE Loop ;if not done, loop
Over
;R1 holds the largest
Comparing two null terminated
strings
Loop
LDRB R3, [R0] ;get next character of string 1
LDRB R4, [R1] ;get next character of string 2
CMP R3, R4 ;compare
BNE Notsame ;if not same, strings do not match
CMP R3, #0 ;check if end of string reached
BEQ Same ;if equal, same
ADD R0, R0, #1 ;increment pointer to string 1
ADD R1, R1, #1 ;increment pointer to string 2
BAL Loop ;branch always to check next character
Notsame
MOV R2, #-1 ;mark not matched
BAL Over
Same
MOV R2, #0 ;mark matched
Over
;R2 holds the match
Thank you

You might also like