You are on page 1of 70

www.jntuworld.

com

ARM
Advanced RISC Machines

www.jntuworld.com

Introduction and Architecture

www.jntuworld.com

ARM Applications

www.jntuworld.com

www.jntuworld.com

Why ARM here ???


ARM is one of the most licensed and thus p processor cores in the world widespread p Used especially in portable devices due to low power consumption and reasonable performance Several interesting extensions available like Thumb instruction set and Jazelle Java machine

www.jntuworld.com

www.jntuworld.com

RISC vs CISC Architecture vs.


RISC Fixed width instructions CISC Variable length instructions

Few formats of instructions Several formats of instructions Load/Store Architecture Memory values can be used as operands in instructions Large Register bank Small Register Bank Instructions are pipelinable Pipelining is Complex

www.jntuworld.com

www.jntuworld.com

RISC vs. CISC Organization vs


RISC Hardwired instruction decode Single cycle execution of all instructions CISC Microcode ROMS instruction decoder Multi cycle execution on instruction

www.jntuworld.com

www.jntuworld.com

RISC Advantages
A Smaller Die Size A Shorter Development Time Higher Performance (Bit Tricky)

www.jntuworld.com

Smaller things have higher natural frequencies

www.jntuworld.com

RISC Disadvantages
Generally poor code density (Fixed g ) Length Instruction)

www.jntuworld.com

www.jntuworld.com

ARM History
ARM Acorn RISC Machine(19831985)

Acorn Computers Limited, Cambridge, p , g , England

www.jntuworld.com

ARM Advanced RISC Machine 1990


ARM Limited, 1990 ARM has been licensed to many y semiconductor manufacturers

www.jntuworld.com

Semiconductor Partners

www.jntuworld.com

www.jntuworld.com

Features used from RISC


A Load/Store Architecture Fixed Length 32-bit Instructions 32 bit 3- Address Instruction Formats

www.jntuworld.com

www.jntuworld.com

Load Store Architecture


Memory can be accessed only through two dedicated instructions

LDR STR

; move word from memory to register ; move word from register to memory

www.jntuworld.com

All other instructions have to work on registers only.

www.jntuworld.com

3 Address Instruction Format


f bits
Function
Example Add d, s1, s2 ; d =s1+s2

n bits
op 1 addr.

n bits
op 2 addr.

n bits
dest. addr.

www.jntuworld.com

www.jntuworld.com

Features Rejected from Berkeley RISC Single Cycle Execution of ALL Si l C l E ti f Instructions

Single Memory for Instruction & Data Even a simple load/store will require at least p q two cycles Separate Data & Instruction was the solution p but was too costly those times These GUYS used the extra cycle for y something useful such as supporting autoindexing

www.jntuworld.com

www.jntuworld.com

ARM Design Policy


Arm A core uses RISC A hit t Architecture

Reduced Instruction Set Load Store A hit t L d St Architecture Large No of General Purpose Registers Parallel execution with Pipelines Enhanced i t ti E h d instructions f for
THUMB state DSP Instructions Conditional Execution Instructions 32 bit Barrel Shifter

www.jntuworld.com

But some differences from RISC

www.jntuworld.com

Registers
ARM has Load Store Architecture General Purpose Registers can hold data or address Total f T t l of 37 Registers, each of 32 bit R i t h f There are 17 or 18 active Registers

www.jntuworld.com

16 data registers 2 status registers t t i t

www.jntuworld.com

Registers(2)
Registers R0 R12 are General Purpose R0 g Registers R1 R2 R3 R13 is used as Stack Pointer (sp) R4 R5 R14 i used as Li k register (l ) is d Link i t (lr) R6 R7 R8 R15 is used as Program Counter (pc) R9 R10 CPSR is Current Program Status RegisterR11 R12 SPSR is Saved Program Status Register R13 R14
R15 CPSR SPSR

www.jntuworld.com

www.jntuworld.com

Registers(3) g ( )
CPSR
31 28 24 23 16 15 8 7 6 5 4 0

N Z C V

I F T

mode d

hold information about the most recently performed ALU operation set the processor operating mode Condition code flags Interrupt Disable bits. N = Negative result from ALU
Z = Zero result from ALU C = ALU operation Carried out V = ALU operation oVerflowed

www.jntuworld.com

I = 1: Disables the IRQ. F = 1: Disables the FIQ. Architecture xT only T = 0: Processor in ARM state T = 1: Processor in Thumb state

T Bit

J bit
Architecture 5TEJ only J = 1: Processor in Jazelle state

Mode bits
Specify the processor mode

www.jntuworld.com

Operation Modes
Mode User FIQ IRQ Supervisor Mode Abort Undefined Instruction U d fi d I t ti System Registers User _fiq _irq i _svc _abt _und d User CPSR[4:0] 10000 10001 10010 10011 10111 11011 11111

www.jntuworld.com

www.jntuworld.com

Processor Modes

www.jntuworld.com

www.jntuworld.com

Banked Registers
General registers and Program Counter User32 / System r0 r1 r2 r3 r4 r5 r6 r7 7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) FIQ32 r0 r1 r2 r3 r4 r5 r6 r7 7 r8_fiq r9_fiq r10_fiq r11_fiq q r12_fiq r13_fiq r14_fiq r15 (pc) Supervisor32 r0 r1 r2 r3 r4 r5 r6 r7 7 r8 r9 r10 r11 r12 r13_svc r14_svc r15 (pc) Program Status Registers cpsr cpsr spsr_fiq spsr fiq sprsr_fiq sprsr fiq cpsr spsr_svc spsr svc cpsr spsr_abt spsr abt cpsr sprsr_fiq sprsr irq spsr_irq spsr fiq cpsr spsr_undef spsr undef sprsr_fiq sprsr fiq Abort32 r0 r1 r2 r3 r4 r5 r6 r7 7 r8 r9 r10 r11 r12 r13_abt r14_abt r15 (pc) IRQ32 r0 r1 r2 r3 r4 r5 r6 r7 7 r8 r9 r10 r11 r12 r13_irq r14_irq r15 (pc) Undefined32 r0 r1 r2 r3 r4 r5 r6 r7 7 r8 r9 r10 r11 r12 r13_undef r14_undef r15 (pc)

www.jntuworld.com

www.jntuworld.com

The Memory System


Memory may be viewed as linear array of y bytes number from 0 to 2^32 1 Data Bytes may be 8-bit (B), 16-bit (HW), or 32 bit (W) 32-bit Words are always aligned at 4-byte boundaries i.e least two bits are zero Half Words are aligned on even boundaries

www.jntuworld.com

www.jntuworld.com

ARM Memory Organization


bit 31
23 19 15 5 11 7 3 22 18 21 17 13 3 9 5 1

bit 0
20 16 12 8 4 0

www.jntuworld.com

word16
14 10

half-word14
6

half-word12 word8

byte6
2

half-word4 byte1 byte0

byte3

byte2

byte y address

www.jntuworld.com

Memory Formats
Little Endian (Default) Big Endian
bit 31 bit 0 bit 0 bit 31

www.jntuworld.com

byte 3 byte 2 byte 1 byte 0 little-Endian

byte 0 byte 1 byte 2 byte 3 big-Endian

ARM supports Both

www.jntuworld.com

ARM Exceptions
ARM supports range of Interrupts, Traps, p , g p Supervisor Calls, all grouped under general heading of Exceptions

www.jntuworld.com

www.jntuworld.com

Exceptions
Generated by internal and external events Supports 7 type of exceptions pp yp p

Reset only in supervisor Mode Software Interrupt in Supervisor Mode p p IRQ on IRQ interrupt FIQ on FIQ interrupt Data Abort in Abort mode Undefined Instruction in undefined mode Prefetch Abort in Abort mode

www.jntuworld.com

www.jntuworld.com

Vector Addresses
0x00000000 0x00000004 0x00000008 0x0000000C 0x00000010 0x00000014 0x00000018 0x0000001C

Reset Undefined Instruction Software Interrupt Prefetch Abort Data Abort Reserved IRQ FIQ Q

www.jntuworld.com

www.jntuworld.com

Exception Priorities
1. 2. 2 3. 4. 5. 5 6. Reset (Highest Priority) Data Abort FIQ IRQ Prefetch Abort SWI, Undefined

www.jntuworld.com

www.jntuworld.com

ARM RoadMap

www.jntuworld.com

www.jntuworld.com

ARM Processor Families

www.jntuworld.com

www.jntuworld.com

ARM 7TDMI
Version V i 4 Supports

Thumb : 16 bit compressed instruction set Debug : On chip debug support Enhanced Multiply : higher performance, long multiply Embedded ICE hardware 32 bit data bus Data size can be byte , half word or word word, Words : 4 byte aligned Half word : 2 byte aligned

www.jntuworld.com

Von Neumann A hi V N Architecture

www.jntuworld.com

ARM core dataflow model


Data Instruction Decoder Sign Extend Read r15 pc Rn A Rm B A Barrel Shifter N ALU MAC B Acc Register File r0 r15 0 15 Rd Result

www.jntuworld.com

Address Register Incrementer Address

www.jntuworld.com

ARM7TDMI core

www.jntuworld.com

www.jntuworld.com

Operating States
Supports 2 instruction sets

ARM 32 bit instruction se 3 b s uc o set Thumb 16 bit instruction set

www.jntuworld.com

www.jntuworld.com

Thumb State
Subset f th S b t of the ARM instructions i t ti higher code density ( g y (35% reduction) ) better performance than 16 bit processors Suitable for use with 16 bit memory devices
(160% b better performance) f )

www.jntuworld.com

Transparently decompressed to 32 bit p y p instructions

www.jntuworld.com

ARM State
Able to access more large memories y efficiently 32 bit integer arithmetic in a single cycle More number of i t ti M b f instructions Better performance

www.jntuworld.com

www.jntuworld.com

Switching States
ARM to Thumb

Execute the BX instruction with s a e b ecu e e s uc o state bit=1 Execute the BX i t ti with state bit 0 E t th instruction ith t t bit=0 An interrupt or exception occurs

www.jntuworld.com

Thumb to ARM

www.jntuworld.com

Which State to use


Low memory system : use thumb 16 bit memory : use thumb Performance is critical : use ARM

www.jntuworld.com

Example : in execution of interrupt routines

Performance is critical AND Memory is low : use both ARM and thumb example : i i t l in interrupt routines t ti

www.jntuworld.com

ARM
Advanced RISC Machines

www.jntuworld.com

ARM Instruction Set

www.jntuworld.com

ARM Features
Load-Store Architecture 3 Address 3-Address Data Processing Instructions Conditional Execution of Instructions Powerful Load/Store Multiple Register General Shift Operation (Single Cycle) Extension of Instruction Set Co-processor 16-bit Compressed Instruction Set

www.jntuworld.com

www.jntuworld.com

Types of Instructions
Data Processing Instructions Data Transfer Instructions Control Flow Instructions

www.jntuworld.com

www.jntuworld.com

Conditional Execution
Mnemonic EQ NE CS HS CC LO MI PL VS VC HI LS GE LT GT LE AL 8/17/2011 Name equal not equal carry set/unsigned higher or same y / g g carry clear/unsigned lower minus/negative plus/positive or zero overflow no overflow unsigned higher unsigned lower or same signed greater than or equal signed less than i dl th signed greater than signed less than or equal always (unconditional) C-DAC,Hyderabad Condition Flags Z z C c N n V v zC Z or c NV or nv Nv N or nV V NzV or nzv Z or Nv or nV ignored 41

www.jntuworld.com

www.jntuworld.com

Data Processing Operations


Arithmetic Operations Bit wise Bit-wise Operations Register Movement Operations Comparison Operations

www.jntuworld.com

www.jntuworld.com

Arithmetic Operations
ADD r0, r1, r2 DD 0 1 2 ADC r0 r1 r2 r0, r1, SUB r0, r1, r2 SBC r0, r1, r2 RSB r0, r1, r2 RSC r0, r1, r2 r0 := r1 + r2 0 1 2 r0 := r1 + r2 + C r0 := r1 - r2 r0 := r1 - r2 + C - 1 r0 := r2 r1 r0 := r2 r1 + C - 1

www.jntuworld.com

www.jntuworld.com

Bit-wise Bit wise Logical Operations


AND r0, r1, r2 ND 0 1 2 ORR r0, r1, r2 EOR r0, r1, r2 BIC r0, r1, r2 0 1 2 r0 := r1 and r2 0 1 d 2 r0 := r1 or r2 r0 := r1 xor r2 r0 := r1 and ( t) r2 0 1 d (not) 2

www.jntuworld.com

www.jntuworld.com

Register Movement Operations


MOV r0, r2 MVN r0 r2 r0, r0 := r2 r0 := not r2

www.jntuworld.com

www.jntuworld.com

Comparison Operations
CMP r1, r2 1 2 CMN r1, r2 TST r1, r2 TEQ r1, r2 1 2 set cc on r1 - r2 1 2 set cc on r1 + r2 set cc on r1 and r2 set cc on r1 xor r2 t 1 2

www.jntuworld.com

www.jntuworld.com

Immediate Operands
If we need to add constant

ADD r3, r3, #1 ; r3 := r3 + 1 3, 3, # 3 3 AND r8, r7, #&ff ; r8 := r7[7:0]

www.jntuworld.com

www.jntuworld.com

Shift Register Operands


Second register operand is subjected to shift before it is combined with first operand ADD r3 r2 r1 LSL #3 r3, r2, r1, ; r3 := r2 + (r1*8)

www.jntuworld.com

www.jntuworld.com

ARM Shift Operations


LSL- Logical Shift Left LSR LSR- Logical Shift Right ASL- Arithmetic Shift Left ASR- Arithmetic Shift Right ROR- Rotate Right RRX- Rotate Right Extended

www.jntuworld.com

www.jntuworld.com

LSL, LSR, ASL, ASR, ROR, RRX


31 0 31 00000 00000

www.jntuworld.com

LSL #5
31 0 0 31 1

LSR #5
0

00000 0

1 1 11 1 1

ASR #5
31

, positive operand 0 C

ASR #5
31

, negative operand

ROR #5

RRX

www.jntuworld.com

Shift Value in Register


It is also possible to use a register value p y to specify the number of bits the second operand should be shifted by: ADD r5, r5, r3, LSL r2 r5: r5 + r3 * 2^r2

www.jntuworld.com

www.jntuworld.com

Setting the Condition Codes


All DPI can affect the condition codes For all DPI except comparisons a special request needs to be made At assembly l bl level th request i made b l the t is d by adding an S to opcode Eg: ADDS r0, r0, r1 0 0 1 ADC r3, r3, r2 , ,

www.jntuworld.com

www.jntuworld.com

Multiplies
MUL r4, r3, r2 4 3 2 Some Rules

Immediate second operand not supported The result register must not be the same as the first source register

www.jntuworld.com

The Basic ARM provides two multiplication instructions. instructions


Multiply
MUL{<cond>}{S} Rd Rm Rs Rd, Rm, ; Rd = Rm * Rs

Multiply Accumulate - does addition for free


MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn

www.jntuworld.com

Multiply-Long and Multiply-Accumulate Long


Instructions are

MULL which gives RdHi,RdLo:=Rm*Rs MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

www.jntuworld.com

However the full 64 bit of the result now matter (lower precision multiply instructions simply throws top 32bits away) p y)

Need to specify whether operands are signed or unsigned

www.jntuworld.com

Loading full 32 bit constants


Although the MOV/MVN mechanism will load a large range of constants into a register, sometimes this mechanism will not generate the required constant. Therefore, Therefore the assembler also provides a method which will load ANY 32 bit constant:
LDR rd,=numeric constant

www.jntuworld.com

If the constant can be constructed using either a MOV or MVN then this will be the instruction actually generated. Otherwise, the assembler will produce an LDR instruction with a p PC-relative address to read the constant from a literal pool.
LDR r0,=0x42 ; generates MOV r0,#0x42 LDR r0,=0x55555555 ; generate LDR r0,[pc, offset to lit pool]

As this mechanism will always generate the best instruction for a given case it is the recommended way of loading constants case, constants.

www.jntuworld.com

Branch Instructions
Branch Branch with Link Branch Exchange Branch Exchange with Link B{<cond>} label BL{<cond>} label { } BX{<cond>} Rm BLX {<cond>} Rm

www.jntuworld.com

www.jntuworld.com

Load / Store Instructions


Single Register Load & Store

transfer o a da a item (by e, half-word, word) a s e of data e (byte, a o d, o d) between ARM registers and memory enable transfer of large quantities of data used for procedure entry and exit, to save/restore workspace registers, to copy blocks of data around memory

www.jntuworld.com

Multiple Register Load & Store


www.jntuworld.com

Single register data transfer


The basic load and store instructions are:

Load and Store Word or Byte


LDR / STR / LDRB / STRB

ARM Architecture Version 4 also adds support for halfwords and signed data.

www.jntuworld.com

Load and Store Halfword


LDRH / STRH

Load Signed Byte or Halfword - load value and sign extend it to 32 bits.
LDRSB / LDRSH

All of these instructions can be conditionally executed by inserting the appropriate condition code after STR / LDR.

e.g. LDREQB <LDR|STR>{<cond>}{<size>} Rd, <address>

Syntax:

www.jntuworld.com

Register-Indirect Register Indirect Addressing


LDR r0, [r1] STR r0, [ 1] 0 [r1] r0 := mem32[r1] mem32[ 1] := r0 [r1] 0

www.jntuworld.com

Note: r1 keeps a word address (2 LSBs are 0)

www.jntuworld.com

Pre Indexed Addressing


LDR r0, [r1, #4] r0 := mem32[r1]

www.jntuworld.com

www.jntuworld.com

Post Indexed Addressing


LDR r0, [r1], #4 r0 := mem32[r1] r1 := r1 + 4

www.jntuworld.com

www.jntuworld.com

Auto Indexing Addressing


LDR r0, [ 1 #4]! 0 [r1, r0 := mem32[ 1 + 4] 0 [r1 r1 := r1 + 4

www.jntuworld.com

www.jntuworld.com

Direct functionality of Block Data Transfer


When Wh LDM / STM are not being used to tb i dt implement stacks, it is clearer to specify exactly what functionality of the instruction is:

www.jntuworld.com

i.e. specify whether to increment / decrement the base pointer, before or after the memory access. pointer access

In order to do this, LDM / STM support a further syntax in addition to the stack one:

STMIA / LDMIA : Increment After STMIB / LDMIB : Increment Before STMDA / LDMDA : Decrement After STMDB / LDMDB : Decrement Before

www.jntuworld.com

Stack Operation
Traditionally, Traditionally a stack grows down in memory, with the last pushed memory pushed value at the lowest address. The ARM also supports ascending stacks, where the stack structure grows up through memory. The value of the stack pointer can either:

Point to the last occupied address (Full stack)


and so needs pre-decrementing (ie before the push)

www.jntuworld.com

Point to the next occupied address (Empty stack)


and so needs post-decrementing (ie after the push)

The stack type to be used is given by the postfix to the instruction:


STMFD / LDMFD : Full Descending stack g STMFA / LDMFA : Full Ascending stack. STMED / LDMED : Empty Descending stack STMEA / LDMEA : Empty Ascending stack py g

Note: ARM Compiler will always use a Full descending stack.

www.jntuworld.com

Stack Examples
STMFD sp!, {r0,r1,r3 r5} {r0,r1,r3-r5} STMED sp!, {r0,r1,r3 r5} {r0,r1,r3-r5} STMFA sp!, {r0,r1,r3 r5} {r0,r1,r3-r5} STMEA sp!, {r0,r1,r3 r5} {r0,r1,r3-r5}

0x418
SP r5 r4 4 r3 r1 r0 0 SP r5 r4 r3 r1 r0

www.jntuworld.com

Old SP

SP

r5 r4 r3 r1 r0

r5 r4 r3 r1 r0

Old SP

Old SP

0x400

SP

0x3e8

www.jntuworld.com

PSR Transfer Instructions


MRS and MSR allow contents of CPSR/SPSR to be transferred from appropriate status register to a general purpose register.

All of status register, or just the flags, can be transferred. ; Rd = <psr> ; <psr> = Rm ; <psrf> = Rm

Syntax:
MRS{<cond>} Rd,<psr> MSR{<cond>} <psr>,Rm MSR{<cond>} <psrf> Rm <psrf>,Rm

www.jntuworld.com

where
<psr> = CPSR, CPSR_all, SPSR or SPSR_all < <psrf> = CPSR fl or SPSR fl f> CPSR_flg SPSR_flg

Also an immediate form


MSR{<cond>} <psrf>,#Immediate Thi immediate must be a 32 bit i This i di t tb 32-bit immediate, of which th 4 di t f hi h the most significant bits are written to the flag bits.

www.jntuworld.com

Quiz
Write W it a short code segment that performs a h t d t th t f mode change by modifying the contents of the CPSR

31

The mode you should change to is user mode which has the value 0x10. This Thi assumes that the current mode is a priveleged th t th t d i i l d mode such as supervisor mode. This would happen for instance when the processor is reset - reset code would be run in supervisor mode which would then need to switch to user mode before calling the main routine in your application. You will need to use MSR and MRS, plus 2 logical operations.
28 8 4 0

www.jntuworld.com

N Z CV

I F T

Mode

www.jntuworld.com

Solution
Set up useful constants: mmask EQU 0x1f bits userm EQU 0x10 ; mask to clear mode ; user mode value

www.jntuworld.com

Start off here in supervisor mode.


MRS BIC ORR MSR r0, cpsr; take a copy of the CPSR r0,r0,#mmask ; clear the mode bits r0,r0,#userm ; select new mode cpsr r0; write back the modified cpsr, ; CPSR

End up here in user mode.

www.jntuworld.com

Main features of the ARM Instruction Set


All i t ti instructions are 32 bit l bits long. Most instructions execute in a single cycle. Every instruction can be conditionally executed. y y A load/store architecture Data processing instructions act only on registers Three operand format Combined ALU and shifter for high speed bit manipulation Specific memory access instructions with powerful autoindexing addressing modes. 32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4 v4. Flexible multiple register load and store instructions Instruction set extension via coprocessors

www.jntuworld.com

www.jntuworld.com

www.jntuworld.com

In todays systems the key is not raw processor speed but total effective system performance and power consumption

Done with Day 1