You are on page 1of 85

EE382N-4 Embedded Systems Architecture

TheARMInstructionSetArchitecture
MarkMcDermott
WithhelpfromourgoodfriendsatARM
Fall2008
8/22/2008

EE382N-4 Embedded Systems Architecture

MainfeaturesoftheARMInstructionSet
Allinstructionsare32bitslong.
Mostinstructionsexecuteinasinglecycle.
Mostinstructionscanbeconditionallyexecuted.
Aload/storearchitecture
Dataprocessinginstructionsactonlyonregisters
Threeoperandformat
CombinedALUandshifterforhighspeedbitmanipulation

Specificmemoryaccessinstructionswithpowerfulautoindexingaddressing
modes.
32bitand8bitdatatypes
andalso16bitdatatypesonARMArchitecturev4.

Flexiblemultipleregisterloadandstoreinstructions

Instructionsetextensionviacoprocessors
Verydense16bitcompressedinstructionset(Thumb)
8/22/2008

EE382N-4 Embedded Systems Architecture

Coprocessors

Upto16 coprocessorscanbedefined
ExpandstheARMinstructionset
Eachcoprocessorcanhaveupto16privateregistersofanyreasonablesize
Loadstorearchitecture

EE382N-4 Embedded Systems Architecture

Thumb
Thumbisa16bitinstructionset
OptimizedforcodedensityfromCcode
Improvedperformanceformnarrowmemory
SubsetofthefunctionalityoftheARMinstructionset

Corehastwoexecutionstates ARMandThumb
SwitchbetweenthemusingBXinstruction

Thumbhascharacteristicfeatures:
MostThumbinstructionareexecutedunconditionally
ManyThumbdataprocessinstructionusea2addressformat
ThumbinstructionformatsarelessregularthanARMinstructionformats,as
aresultofthedenseencoding.

EE382N-4 Embedded Systems Architecture

ProcessorModes
TheARMhassixoperatingmodes:
User(unprivilegedmodeunderwhichmosttasksrun)
FIQ(enteredwhenahighpriority(fast)interruptisraised)
IRQ(enteredwhenalowpriority(normal)interruptisraised)
Supervisor(enteredonresetandwhenaSoftwareInterruptinstructionis
executed)
Abort(usedtohandlememoryaccessviolations)
Undef(usedtohandleundefinedinstructions)

ARMArchitectureVersion4addsaseventhmode:
System(privilegedmodeusingthesameregistersasusermode)

8/22/2008

EE382N-4 Embedded Systems Architecture

TheRegisters
ARMhas37registersintotal,allofwhichare32bitslong.

1dedicatedprogramcounter
1dedicatedcurrentprogramstatusregister
5dedicatedsavedprogramstatusregisters
30generalpurposeregisters

Howeverthesearearrangedintoseveralbanks,withthe
accessiblebankbeinggovernedbytheprocessormode.Each
modecanaccess

aparticularsetofr0r12registers
aparticularr13(thestackpointer)andr14(linkregister)
r15(theprogramcounter)
cpsr(thecurrentprogramstatusregister)

Andprivilegedmodescanalsoaccess
aparticularspsr(savedprogramstatusregister)

8/22/2008

EE382N-4 Embedded Systems Architecture

TheARMRegisterSet
Current Visible Registers
Abort
Mode
Undef
SVC
Mode
IRQ
FIQ
Mode
User
Mode

r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r15 (pc)
cpsr
spsr

8/22/2008

Banked out Registers


User

FIQ

IRQ

SVC

Undef

Abort

r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)

r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)

r13 (sp)
r14 (lr)

r13 (sp)
r14 (lr)

r13 (sp)
r14 (lr)

r13 (sp)
r14 (lr)

spsr

spsr

spsr

spsr

spsr

EE382N-4 Embedded Systems Architecture

RegisterOrganizationSummary
User
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r15 (pc)

FIQ

User
mode
r0-r7,
r15,
and
cpsr

r8
r9

IRQ

User
mode
r0-r12,
r15,
and
cpsr

SVC

User
mode
r0-r12,
r15,
and
cpsr

Undef

User
mode
r0-r12,
r15,
and
cpsr

Abort

User
mode
r0-r12,
r15,
and
cpsr

Thumb state
Low registers

Thumb state
High registers

r10
r11
r12
r13 (sp)

r13 (sp)

r13 (sp)

r13 (sp)

r13 (sp)

r14 (lr)

r14 (lr)

r14 (lr)

r14 (lr)

r14 (lr)

spsr

spsr

spsr

spsr

spsr

cpsr

Note: System mode uses the User mode register set


8/22/2008

EE382N-4 Embedded Systems Architecture

AccessingRegistersusingARMInstructions
Nobreakdownofcurrentlyaccessibleregisters.
Allinstructionscanaccessr0r14directly.
MostinstructionsalsoallowuseofthePC.

SpecificinstructionstoallowaccesstoCPSRandSPSR.
Note:Wheninaprivilegedmode,itisalsopossibletoloadstore
the(bankedout)usermoderegisterstoorfrommemory.

8/22/2008

EE382N-4 Embedded Systems Architecture

TheProgramStatusRegisters(CPSRandSPSRs)
31

28

N Z CV

I F T

Mode

CopiesoftheALUstatusflags(latchedifthe
instructionhasthe"S"bitset).

*ConditionCodeFlags
N=NegativeresultfromALUflag.
Z=ZeroresultfromALUflag.
C=ALUoperationCarriedout
V=ALUoperationoVerflowed
*ModeBits
M[4:0]definetheprocessormode.

8/22/2008

*InterruptDisablebits.
I =1,disablestheIRQ.
F =1,disablestheFIQ.
*TBit(Architecturev4Tonly)
T=0,ProcessorinARMstate
T=1,ProcessorinThumbstate

10

EE382N-4 Embedded Systems Architecture

ConditionFlags

LogicalInstruction

ArithmeticInstruction

Negative
(N=1)

Nomeaning

Bit31oftheresulthasbeenset
Indicatesanegativenumberin
signedoperations

Zero
(Z=1)

Resultisallzeroes

Resultofoperationwaszero

Carry
(C=1)

AfterShiftoperation
1wasleftincarryflag

Resultwasgreaterthan32bits

oVerflow
(V=1)

Nomeaning

Resultwasgreaterthan31bits
Indicatesapossiblecorruptionof
thesignbitinsigned
numbers

Flag

8/22/2008

11

EE382N-4 Embedded Systems Architecture

TheProgramCounter(R15)
WhentheprocessorisexecutinginARMstate:
Allinstructionsare32bitsinlength
Allinstructionsmustbewordaligned
ThereforethePCvalueisstoredinbits[31:2]withbits[1:0]equaltozero(as
instructioncannotbehalfwordorbytealigned).

R14isusedasthesubroutinelinkregister(LR)andstoresthe
returnaddresswhenBranchwithLinkoperationsareperformed,
calculatedfromthePC.
Thustoreturnfromalinkedbranch:
MOVr15,r14

or
MOVpc,lr

8/22/2008

12

EE382N-4 Embedded Systems Architecture

ExceptionHandlingandtheVectorTable
Whenanexceptionoccurs,thecore:
CopiesCPSRintoSPSR_<mode>
SetsappropriateCPSRbits
IfcoreimplementsARMArchitecture4Tandis
currentlyinThumbstate,then
ARMstateisentered.

Modefieldbits
Interruptdisableflagsifappropriate.

Mapsinappropriatebankedregisters
StoresthereturnaddressinLR_<mode>
SetsPCtovectoraddress

Toreturn,exceptionhandlerneedsto:
RestoreCPSRfromSPSR_<mode>
RestorePCfromLR_<mode>

8/22/2008

13

EE382N-4 Embedded Systems Architecture

TheOriginalInstructionPipeline
TheARMusesapipelineinordertoincreasethespeedofthe
flowofinstructionstotheprocessor.
Allowsseveraloperationstobeundertakensimultaneously,ratherthan
serially.
PC

FETCH

PC - 4

DECODE

PC - 8

EXECUTE

Instruction fetched from memory

Decoding of registers used in instruction

Register(s) read from Register Bank


Shift and ALU operation
Write register(s) back to Register Bank

Ratherthanpointingtotheinstructionbeingexecuted,thePC
pointstotheinstructionbeingfetched.
8/22/2008

14

EE382N-4 Embedded Systems Architecture

PipelinechangesforARM9TDMI

ARM7TDMI
Instruction
Fetch

ThumbARM
decompress

FETCH

ARM decode

Reg
Shift
Read

Reg
Write

ALU

Reg Select

DECODE

EXECUTE

ARM9TDMI
Instruction
Fetch

FETCH

ARM or Thumb
Inst Decode
Reg
Reg
Decode
Read

DECODE

Shift + ALU

EXECUTE

Memory
Access

MEMORY

Reg
Write

WRITE

EE382N-4 Embedded Systems Architecture

PipelinechangesforARM10vs.ARM11Pipelines
ARM10
Branch
Prediction
Instruction
Fetch

FETCH

ARM or
Thumb
Instruction
Decode

Reg Read

Memory
Access

EXECUTE

MEMORY

ARM11

Fetch
1

Fetch
2

Decode

Issue

Shift

ALU

Saturate

MAC
1

MAC
2

MAC
3

Data
Address Cache
1

Reg
Write

Multiply
Add

Multiply

DECODE

ISSUE

Shift + ALU

Data
Cache
2

Write
back

WRITE

EE382N-4 Embedded Systems Architecture

ARMInstructionSetFormat
3
1

3
0

2
9

2
8

2
7

2
6

2
5

2
4

2
3

Condition

Condition

Condition

Condition

Condition

Condition

2
2

2
1

OPCODE

2
0

1
9

1
8

1
7

1
6

1
5

1
4

1
3

1
2

1
1

1
0

InstructionType

Rn

Rs

0 A

Rd

Rn

Rs

Rm

Multiply

1 U A

RdHIGH

Rd LOW

Rs

Rm

LongMultiply

Rn

Rd

Rm

Swap

P U B W L

Rn

Rd

P U B W L

Rn

Condition

P U 1 W L

Rn

Rd

Condition

P U 0 W L

Rn

Rd

Condition

Condition

Condition

P U N W L

Rn

CRd

CPNum

Condition

CRn

CRd

CPNum

OP2

CRm

COPROCESSOR DATAOP

CRn

Rd

CPNum

OP2

CRm

COPROCESSOR REGXFER

Condition
Condition

8/22/2008

Dataprocessing

OFFSET

Load/Store Byte/Word

REGISTERLIST
OFFSET1
0

Load/Store Multiple

S H 1

OFFSET2

Halfword TransferImm Off

S H 1

Rm

Halfword TransferReg Off

BRANCH OFFSET
0

Op1
OP1

OPERAND2

SWI NUMBER

Branch
1

Rn

OFFSET

Branch Exchange
COPROCESSOR DATAXFER

SoftwareInterrupt

17

EE382N-4 Embedded Systems Architecture

ConditionalExecution
Mostinstructionsetsonlyallowbranchestobeexecuted
conditionally.
Howeverbyreusingtheconditionevaluationhardware,ARM
effectivelyincreasesnumberofinstructions.
AllinstructionscontainaconditionfieldwhichdetermineswhethertheCPU
willexecutethem.
Nonexecutedinstructionsconsume1cycle.
CantcollapsetheinstructionlikeaNOP.Stillhavetocompletecyclesoastoallow
fetchinganddecodingofthefollowinginstructions.

Thisremovestheneedformanybranches,whichstallthe
pipeline(3cyclestorefill).
Allowsverydenseinlinecode,withoutbranches.
TheTimepenaltyofnotexecutingseveralconditionalinstructionsis
frequentlylessthanoverheadofthebranch
orsubroutinecallthatwouldotherwisebeneeded.
8/22/2008

18

EE382N-4 Embedded Systems Architecture

TheConditionField
3
1

3
0

2
9

Condition

2
8

2
7

2
6

2
5

2
4

2
3

2
2

2
1

OPCODE

2
0

1
9

1
8

1
7

1
6

1
5

Rn

1
4

1
3

Rs

0000 = EQ - Z set (equal)


0001 = NE - Z clear (not equal)
0010 = HS / CS - C set (unsigned higher or
same)
0011 = LO / CC - C clear (unsigned lower)
0100 = MI -N set (negative)
0101 = PL - N clear (positive or zero)
0110 = VS - V set (overflow)
0111 = VC - V clear (no overflow)
1000 = HI - C set and Z clear (unsigned
higher)

8/22/2008

1
2

1
1

1
0

OPERAND2

InstructionType
Dataprocessing

1001 = LS - C clear or Z (set unsigned lower


or same)
1010 = GE - N set and V set, or N clear and V
clear (>or =)
1011 = LT - N set and V clear, or N clear and
V set (>)
1100 = GT - Z clear, and either N set and V
set, or N clear and V set (>)
1101 = LE - Z set, or N set and V clear,or N
clear and V set (<, or =)
1110 = AL - always
1111 = NV - reserved.

19

EE382N-4 Embedded Systems Architecture

UsingandupdatingtheConditionField
Toexecuteaninstructionconditionally,simplypostfixitwiththeappropriate
condition:
Forexampleanaddinstructiontakestheform:
ADDr0,r1,r2 ;r0=r1+r2(ADDAL)

Toexecutethisonlyifthezeroflagisset:
ADDEQr0,r1,r2

;Ifzeroflagsetthen
;...r0=r1+r2

Bydefault,dataprocessingoperationsdonotaffecttheconditionflags(apart
fromthecomparisonswherethisistheonlyeffect).Tocausethecondition
flagstobeupdated,theSbitoftheinstructionneedstobesetbypostfixing
theinstruction(andanyconditioncode)withanS.
Forexampletoaddtwonumbersandsettheconditionflags:
ADDSr0,r1,r2
andsetflags

8/22/2008

;r0=r1+r2

;...

20

EE382N-4 Embedded Systems Architecture

ConditionalExecutionandFlags
ARMinstructionscanbemadetoexecuteconditionallybypostfixingthemwiththe
appropriateconditioncodefield.
Thisimprovescodedensityand performancebyreducingthenumberofforward
branchinstructions.
CMP
BEQ
ADD
skip

r3,#0
skip
r0,r1,r2

CMP
r3,#0
ADDNE r0,r1,r2

Bydefault,dataprocessinginstructionsdonotaffecttheconditioncodeflagsbutthe
flagscanbeoptionallysetbyusingS.CMPdoesnotneedS.
loop

decrement r1 and set flags


SUBS r1,r1,#1
BNE loop
if Z flag clear then branch

8/22/2008

21

EE382N-4 Embedded Systems Architecture

Branchinstructions(1)
Branch:
BranchwithLink:
3
1

3
0

2
9

Condition

2
8

B{<cond>}label
BL{<cond>}sub_routine_label

2
7

2
6

2
5

2
4

2
3

2
2

2
1

2
0

1
9

1
8

1
7

1
6

1
5

1
4

1
3

1
2

1
1

1
0

BRANCH OFFSET

Linkbit

0=Branch
1=Branchwithlink

Conditionfield

Theoffsetforbranchinstructionsiscalculatedbytheassembler:
Bytakingthedifferencebetweenthebranchinstructionandthetargetaddress
minus8(toallowforthepipeline).
Thisgivesa26bitoffsetwhichisrightshifted2bits(asthebottomtwobitsare
alwayszeroasinstructionsareword aligned)andstoredintotheinstruction
encoding.
Thisgivesarangeof 32Mbytes.

8/22/2008

22

EE382N-4 Embedded Systems Architecture

Branchinstructions(2)
Whenexecutingtheinstruction,theprocessor:
shiftstheoffsetlefttwobits,signextendsitto32bits,andaddsittoPC.

ExecutionthencontinuesfromthenewPC,oncethepipelinehas
beenrefilled.
The"Branchwithlink"instructionimplementsasubroutinecall
bywritingPC4intotheLRofthecurrentbank.
i.e.theaddressofthenextinstructionfollowingthebranchwithlink
(allowingforthepipeline).

Toreturnfromsubroutine,simplyneedtorestorethePCfrom
theLR:
MOVpc,lr
Again,pipelinehastorefillbeforeexecutioncontinues.

8/22/2008

23

EE382N-4 Embedded Systems Architecture

Branchinstructions(3)
The"Branch"instructiondoesnotaffectLR.
Note:Architecture4ToffersafurtherARMbranchinstruction,BX
SeeThumbInstructionSetModulefordetails.

BL<subroutine>
StoresreturnaddressinLR
ReturningimplementedbyrestoringthePCfromLR
Fornonleaffunctions,LRwillhavetobestacked
func1
:
:
BLfunc1
:
:

8/22/2008

STMFDsp!,{regs,lr}
:
BLfunc2
:
LDMFDsp!,{regs,pc}

func2
:
:
:
:
:
MOVpc,lr

24

EE382N-4 Embedded Systems Architecture

ConditionalBranches
Branch
B
BAL
BEQ
BNE
BPL
BMI
BCC
BLO
BCS
BHS
BVC
BVS
BGT
BGE
BLT
BLE
BHI
BLS

8/22/2008

Interpretation
Unconditional
Always
Equal
Notequal
Plus
Minus
Carryclear
Lower
Carryset
Higherorsame
Overflowclear
Overflowset
Greaterthan
Greaterorequal
Lessthan
Lessorequal
Higher
Lowerorsame

Normaluses
Alwaystakethisbranch
Alwaystakethisbranch
Comparisonequalorzeroresult
Comparisonnotequalornonzeroresult
Resultpositiveorzero
Resultminusornegative
Arithmeticoperationdidnotgivecarryout
Unsignedcomparisongavelower
Arithmeticoperationgavecarryout
Unsignedcomparisongavehigherorsame
Signedintegeroperation;nooverflowoccurred
Signedintegeroperation;overflowoccurred
Signedintegercomparisongavegreaterthan
Signedintegercomparisongavegreaterorequal
Signedintegercomparisongavelessthan
Signedintegercomparisongavelessthanorequal
Unsignedcomparisongavehigher
Unsignedcomparisongavelowerorsame

25

EE382N-4 Embedded Systems Architecture

DataprocessingInstructions
LargestfamilyofARMinstructions,allsharingthesame
instructionformat.
Contains:

Arithmeticoperations
Comparisons(noresults justsetconditioncodes)
Logicaloperations
Datamovementbetweenregisters

Remember,thisisaload/storearchitecture
Theseinstructiononlyworkonregisters,NOTmemory.

Theyeachperformaspecificoperationononeortwooperands.
Firstoperandalwaysaregister Rn
SecondoperandsenttotheALUviabarrelshifter.

Wewillexaminethebarrelshiftershortly.

8/22/2008

26

EE382N-4 Embedded Systems Architecture

ArithmeticOperations
Operationsare:

ADD
ADC
SUB
SBC
RSB
RSC

operand1+operand2
operand1+operand2+carry
operand1 operand2
operand1 operand2+carry1
operand2 operand1
operand2 operand1+carry 1

;Add
;Addwithcarry
;Subtract
;Subtractwithcarry
;Reversesubtract
;Reversesubtractwithcarry

Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2

Examples
ADDr0,r1,r2
SUBGTr3,r3,#1
RSBLESr4,r5,#5

8/22/2008

27

EE382N-4 Embedded Systems Architecture

Comparisons
Theonlyeffectofthecomparisonsistoupdatethecondition
flags.ThusnoneedtosetSbit.
Operationsare:

CMP
CMN
TST
TEQ

operand1 operand2
operand1+operand2
operand1ANDoperand2
operand1EORoperand2

;Compare
;Comparenegative
;Test
;Testequivalence

Syntax:
<Operation>{<cond>}Rn,Operand2

Examples:
CMP
TSTEQ

8/22/2008

r0,r1
r2,#5

28

EE382N-4 Embedded Systems Architecture

LogicalOperations
Operationsare:
AND operand1ANDoperand2
EOR operand1EORoperand2
ORR operand1ORoperand2
ORNoperand1NORoperand2
BIC operand1ANDNOToperand2[iebitclear]

Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2

Examples:
AND r0,r1,r2
BICEQ r2,r3,#7
EORS r1,r3,r0

8/22/2008

29

EE382N-4 Embedded Systems Architecture

DataMovement
Operationsare:
MOV operand2
MVN NOToperand2

Notethatthesemakenouseofoperand1.
Syntax:
<Operation>{<cond>}{S}Rd,Operand2

Examples:
MOV
MOVS
MVNEQ

8/22/2008

r0,r1
r2,#10
r1,#0

30

EE382N-4 Embedded Systems Architecture

TheBarrelShifter
TheARMdoesnthaveactualshiftinstructions.
Insteadithasabarrelshifterwhichprovidesamechanismto
carryoutshiftsaspartofotherinstructions.
Sowhatoperationsdoesthebarrelshiftersupport?

8/22/2008

31

EE382N-4 Embedded Systems Architecture

BarrelShifter LeftShift
Shiftsleftbythespecifiedamount(multipliesbypowersoftwo)
e.g.
LSL#5=>multiplyby32

LogicalShiftLeft(LSL)

CF

8/22/2008

Destination

32

EE382N-4 Embedded Systems Architecture

BarrelShifter RightShifts
LogicalShiftRight(LSR)
Shiftsrightbythespecified
amount(dividesbypowersof
two)e.g.

LogicalShiftRight
...0

Destination

CF

zeroshiftedin

LSR#5=divideby32

ArithmeticShiftRight
ArithmeticShiftRight(ASR)
Shiftsright(dividesbypowersof
two)andpreservesthesignbit,
for2'scomplementoperations.
e.g.

Destination

CF

Signbitshiftedin

ASR#5=divideby32

8/22/2008

33

EE382N-4 Embedded Systems Architecture

BarrelShifter Rotations
RotateRight(ROR)

RotateRight

SimilartoanASRbutthebits
wraparoundastheyleavethe
LSBandappearastheMSB.

Destination

CF

e.g.ROR#5
Notethelastbitrotatedisalso
usedastheCarryOut.
RotateRightExtended(RRX)
ThisoperationusestheCPSRC
flagasa33rdbit.
Rotatesrightby1bit.Encoded
asROR#0

8/22/2008

RotateRightthroughCarry

Destination

CF

34

EE382N-4 Embedded Systems Architecture

UsingtheBarrelShifter:TheSecondOperand
Operand
1

Operand
2
Barrel
Shifter

ALU

Register,optionallywithshift
operationapplied.
Shiftvaluecanbeeitherbe:
5bitunsignedinteger
Specifiedinbottombyteof
anotherregister.

* Immediatevalue
8bitnumber
Canberotatedright
throughanevennumber
ofpositions.
Assemblerwillcalculate
rotateforyoufrom
constant.

Result
8/22/2008

35

EE382N-4 Embedded Systems Architecture

SecondOperand:ShiftedRegister
Theamountbywhichtheregisteristobeshiftediscontainedin
either:
theimmediate5bitfieldintheinstruction
NOOVERHEAD
Shiftisdoneforfree executesinsinglecycle.

thebottombyteofaregister(notPC)
Thentakesextracycletoexecute
ARMdoesnthaveenoughreadportstoread3registersatonce.
Thensameasonotherprocessorswhereshiftis
separateinstruction.

Ifnoshiftisspecifiedthenadefaultshiftisapplied:LSL#0
i.e.barrelshifterhasnoeffectonvalueinregister.

8/22/2008

36

EE382N-4 Embedded Systems Architecture

SecondOperand:UsingaShiftedRegister
Usingamultiplicationinstructiontomultiplybyaconstantmeansfirstloading
theconstantintoaregisterandthenwaitinganumberofinternalcyclesfor
theinstructiontocomplete.
Amoreoptimumsolutioncanoftenbefoundbyusingsomecombinationof
MOVs,ADDs,SUBsandRSBswithshifts.
Multiplicationsbyaconstantequaltoa((powerof2) 1)canbedoneinonecycle.
MOVR2,R0,LSL#2
;ShiftR0leftby2,writetoR2,(R2=R0x4)
ADDR9,R5,R5,LSL#3 ;R9=R5+R5x8orR9=R5x9
RSBR9,R5,R5,LSL#3 ;R9=R5x8 R5orR9=R5x7
SUBR10,R9,R8,LSR#4;R10=R9 R8/16
MOVR12,R4,RORR3 ;R12=R4rotatedrightbyvalueofR3

8/22/2008

37

EE382N-4 Embedded Systems Architecture

SecondOperand:ImmediateValue(1)
Thereisnosingleinstructionwhichwillloada32bitimmediateconstantinto
aregisterwithoutperformingadataloadfrommemory.
AllARMinstructionsare32bitslong
ARMinstructionsdonotusetheinstructionstreamasdata.

Thedataprocessinginstructionformathas12bitsavailableforoperand2
Ifuseddirectlythiswouldonlygivearangeof4096.

Insteaditisusedtostore8bitconstants,givingarangeof0 255.
These8bitscanthenberotatedrightthroughanevennumberofpositions(ie
RORsby0,2,4,..30).
Thisgivesamuchlargerrangeofconstantsthatcanbedirectlyloaded,thoughsome
constantswillstillneedtobeloadedfrommemory.

8/22/2008

38

EE382N-4 Embedded Systems Architecture

SecondOperand:ImmediateValue(2)
Thisgivesus:

0 255
256,260,264,..,1020
1024,1040,1056,..,4080
4096,4160,4224,..,16320

[0 0xff]
[0x1000x3fc,step4,0x400xffror 30]
[0x4000xff0,step16,0x400xffror 28]
[0x10000x3fc0,step64,0x400xffror 26]

Thesecanbeloadedusing,forexample:
MOVr0,#0x40,26

;=>MOVr0,#0x1000(ie4096)

Tomakethiseasier,theassemblerwillconverttothisformforusifsimply
giventherequiredconstant:
MOVr0,#4096

;=>MOVr0,#0x1000(ie0x40ror 26)

ThebitwisecomplementscanalsobeformedusingMVN:
MOVr0,#0xFFFFFFFF

;assemblestoMVNr0,#0

Iftherequiredconstantcannotbegenerated,anerrorwill
bereported.

8/22/2008

39

EE382N-4 Embedded Systems Architecture

Loadingfull32bitconstants
AlthoughtheMOV/MVNmechanismwillloadalargerangeofconstantsintoa
register,sometimesthismechanismwillnotgeneratetherequiredconstant.
Therefore,theassembleralsoprovidesamethodwhichwillloadANY32bit
constant:
LDRrd,=numericconstant

IftheconstantcanbeconstructedusingeitheraMOVorMVNthenthiswillbe
theinstructionactuallygenerated.
Otherwise,theassemblerwillproduceanLDRinstructionwithaPCrelative
addresstoreadtheconstantfromaliteralpool.
LDRr0,=0x42
LDRr0,=0x55555555

;generatesMOVr0,#0x42
;generateLDRr0,[pc,offsettolitpool]
:
:
DCD0x55555555

Asthismechanismwillalwaysgeneratethebestinstructionforagivencase,it
istherecommendedwayofloadingconstants.
8/22/2008

40

EE382N-4 Embedded Systems Architecture

MultiplicationInstructions
TheBasicARMprovidestwomultiplicationinstructions.
Multiply
MUL{<cond>}{S}Rd,Rm,Rs

MultiplyAccumulate

;Rd=Rm*Rs

doesadditionforfree

MLA{<cond>}{S}Rd,Rm,Rs,Rn

;Rd=(Rm*Rs)+Rn

Restrictionsonuse:
RdandRmcannotbethesameregister
CanbeavoidedbyswappingRmandRsaround.Thisworksbecausemultiplication
iscommutative.

CannotusePC.

Thesewillbepickedupbytheassemblerifoverlooked.
Operandscanbeconsideredsignedorunsigned
Uptousertointerpretcorrectly.

8/22/2008

41

EE382N-4 Embedded Systems Architecture

MultiplicationImplementation
TheARMmakesuseofBoothsAlgorithmtoperforminteger
multiplication.
OnnonMARMsthisoperateson2bitsofRsatatime.
Foreachpairofbitsthistakes1cycle(plus1cycletostartwith).
Howeverwhentherearenomore1sleftinRs,themultiplicationwillearly
terminate.

Example:Multiply18and1:Rd=Rm*Rs
Rm

18 0000 0000 0000 0000 0000 0000 0001 0010

18

Rs

Rs

1 1111 1111 1111 1111 1111 1111 1111 1111

Rm

17cycles

4cycles

Note:Compilerdoesnotuseearlyterminationcriteriato
decideonwhichordertoplaceoperands.
8/22/2008

42

EE382N-4 Embedded Systems Architecture

ExtendedMultiplyInstructions
MvariantsofARMcorescontainextendedmultiplication
hardware.Thisprovidesthreeenhancements:
An8bitBoothsAlgorithmisused
Multiplicationiscarriedoutfaster(maximumforstandardinstructionsisnow5
cycles).

Earlyterminationmethodimprovedsothatnowcompletesmultiplication
whenallremainingbitsetscontain
allzeroes(aswithnonMARMs),or
allones.

Thusthepreviousexamplewouldearlyterminatein2cyclesinboth
cases.
64bitresultscannowbeproducedfromtwo32bitoperands

Higheraccuracy.
Pairofregistersusedtostoreresult.

8/22/2008

43

EE382N-4 Embedded Systems Architecture

MultiplyLong&MultiplyAccumulateLong
Instructionsare
MULLwhichgivesRdHi,RdLo:=Rm*Rs
MLALwhichgivesRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

Howeverthefull64bitoftheresultnowmatter(lowerprecision
multiplyinstructionssimplythrowstop32bitsaway)
Needtospecifywhetheroperandsaresignedorunsigned

Thereforesyntaxofnewinstructionsare:

UMULL{<cond>}{S}RdLo,RdHi,Rm,Rs
UMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs
SMULL{<cond>}{S}RdLo,RdHi,Rm,Rs
SMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs

Notgeneratedbythecompiler.

Warning:UnpredictableonnonMARMs.

8/22/2008

44

EE382N-4 Embedded Systems Architecture

Load/StoreInstructions
TheARMisaLoad/StoreArchitecture:
Doesnotsupportmemorytomemorydataprocessingoperations.
Mustmovedatavaluesintoregistersbeforeusingthem.

Thismightsoundinefficient,butinpracticeitisnt:
Loaddatavaluesfrommemoryintoregisters.
Processdatainregistersusinganumberofdataprocessinginstructions
whicharenotsloweddownbymemoryaccess.
Storeresultsfromregistersouttomemory.

TheARMhasthreesetsofinstructionswhichinteractwithmain
memory.Theseare:
Singleregisterdatatransfer(LDR/STR).
Blockdatatransfer(LDM/STM).
SingleDataSwap(SWP).

8/22/2008

45

EE382N-4 Embedded Systems Architecture

Singleregisterdatatransfer
Thebasicloadandstoreinstructionsare:
LoadandStoreWordorByte
LDR/STR/LDRB/STRB

ARMArchitectureVersion4alsoaddssupportforHalfwordsand
signeddata.
LoadandStoreHalfword
LDRH/STRH

LoadSignedByteorHalfword loadvalueandsignextenditto32bits.
LDRSB/LDRSH

Alloftheseinstructionscanbeconditionallyexecutedby
insertingtheappropriateconditioncodeafterSTR/LDR.
e.g.LDREQB

Syntax:
<LDR|STR>{<cond>}{<size>}Rd,<address>

8/22/2008

46

EE382N-4 Embedded Systems Architecture

LoadandStoreWordorByte:BaseRegister
Thememorylocationtobeaccessedisheldinabaseregister
STRr0,[r1]

;Storecontentsofr0tolocationpointedto
;bycontentsofr1.
;Loadr2withcontentsofmemorylocation
;pointedtobycontentsofr1.

LDRr2,[r1]

Source
Register
forSTR

Base
Register

8/22/2008

Memory

r0
0x5

r1
0x200

r2
0x200

0x5

0x5

Destination
Register
forLDR

47

EE382N-4 Embedded Systems Architecture

Load/StoreWordorByte:OffsetsfromtheBaseRegister
Aswellasaccessingtheactuallocationcontainedinthebase
register,theseinstructionscanaccessalocationoffsetfromthe
baseregisterpointer.
Thisoffsetcanbe
Anunsigned12bitimmediatevalue(ie0 4095bytes).
Aregister,optionallyshiftedbyanimmediatevalue

Thiscanbeeitheraddedorsubtractedfromthebaseregister:
Prefixtheoffsetvalueorregisterwith+(default)or.

Thisoffsetcanbeapplied:
beforethetransferismade:Preindexedaddressing
optionallyautoincrementingthebaseregister,bypostfixingtheinstructionwith
an!.

afterthetransferismade:Postindexedaddressing
causingthebaseregistertobeautoincremented.

8/22/2008

48

EE382N-4 Embedded Systems Architecture

Load/StoreWordorByte:PreindexedAddressing
Example:STRr0,[r1,#12]
Memory

r0
0x5

Source
Register
forSTR

Offset
12
Base
Register

0x20c

0x5

r1
0x200

0x200

Tostoretolocation0x1f4insteaduse:STRr0,[r1,#12]
Toautoincrementbasepointerto0x20cuse:STRr0,[r1,#12]!
Ifr2contains3,access0x20cbymultiplyingthisby4:
STRr0,[r1,r2,LSL#2]

8/22/2008

49

EE382N-4 Embedded Systems Architecture

LoadandStoreWordorByte:PostindexedAddressing
Example:STRr0,[r1],#12
Memory

Updated
Base
Register

Original
Base
Register

r1

Offset

0x20c

12

r1

r0
0x5

0x20c

0x200

Source
Register
for STR

0x5

0x200

Toautoincrementthebaseregistertolocation0x1f4insteaduse:
STRr0,[r1],#12

Ifr2contains3,autoincrementbaseregisterto0x20cbymultiplyingthisby
4:
STRr0,[r1],r2,LSL#2

8/22/2008

50

EE382N-4 Embedded Systems Architecture

LoadandStoreswithUserModePrivilege
Whenusingpostindexedaddressing,thereisafurtherformof
Load/StoreWord/Byte:
<LDR|STR>{<cond>}{B}TRd,<post_indexed_address>

Whenusedinaprivilegedmode,thisdoestheload/storewith
usermodeprivilege.
Normallyusedbyanexceptionhandlerthatisemulatingamemoryaccess
instructionthatwouldnormallyexecuteinusermode.

8/22/2008

51

EE382N-4 Embedded Systems Architecture

ExampleUsageofAddressingModes
Imagineanarray,thefirstelementofwhichispointedtobythecontentsofr0.
Ifwewanttoaccessaparticularelement,
thenwecanusepreindexedaddressing:

element

Memory
Offset

12

r1iselementwewant.
LDRr2,[r0,r1,LSL#2]
Pointer to
start of array

Ifwewanttostepthroughevery
1
elementofthearray,forinstance
0
r0
toproducesumofelementsinthe
array,thenwecanusepostindexedaddressingwithinaloop:

4
0

r1isaddressofcurrentelement(initiallyequaltor0).
LDRr2,[r1],#4

Useafurtherregistertostoretheaddressoffinalelement,
sothattheloopcanbecorrectlyterminated.

8/22/2008

52

EE382N-4 Embedded Systems Architecture

OffsetsforHalfwordandSignedHalfword/ByteAccess
TheLoadandStoreHalfwordandLoadSignedByteorHalfword
instructionscanmakeuseofpre andpostindexedaddressingin
muchthesamewayasthebasicloadandstoreinstructions.
Howevertheactualoffsetformatsaremoreconstrained:
Theimmediatevalueislimitedto8bits(ratherthan12bits)givinganoffset
of0255bytes.
Theregisterformcannothaveashiftappliedtoit.

8/22/2008

53

EE382N-4 Embedded Systems Architecture

Effectofendianess
TheARMcanbesetuptoaccessitsdataineitherlittleorbig
endianformat.
Littleendian:
Leastsignificantbyteofawordisstoredinbits07ofanaddressedword.

Bigendian:
Leastsignificantbyteofawordisstoredinbits2431ofanaddressedword.

Thishasnorealrelevanceunlessdataisstoredaswordsandthen
accessedinsmallersizedquantities(halfwords orbytes).
Whichbyte/halfwordisaccessedwilldependontheendianess ofthe
systeminvolved.

8/22/2008

54

EE382N-4 Embedded Systems Architecture

YAEndianess Example
r0 = 0x11223344
31

24 23

11

22

16 15

87

33

44

STR r0, [r1]

31

r1 = 0x100

24 23

11

22

16 15

87

33

Memory

44

Little-endian

24 23

44

16 15

33

87

22

11

r1 = 0x100

Big-endian

LDRB r2, [r1]


31

24 23

00

00

16 15

87

00

r2 = 0x44
8/22/2008

31

44

31

24 23

00

16 15

00

87

00

11

r2 = 0x11
55

EE382N-4 Embedded Systems Architecture

BlockDataTransfer(1)
TheLoadandStoreMultipleinstructions(LDM/STM)allow
betweeen1and16registerstobetransferredtoorfrom
memory.
Thetransferredregisterscanbeeither:
Anysubsetofthecurrentbankofregisters(default).
Anysubsetoftheusermodebankofregisterswheninapriviledgedmode
(postfixinstructionwitha^).
31

28 27

Cond

24 23 22 21 20 19

1 0 0 P U S W L

Condition field

16 15

Rn

Base register

Up/Down bit

Load/Store bit

0 = Down; subtract offset from base


1 = Up ; add offset to base

0 = Store to memory
1 = Load from memory

Pre/Post indexing bit

Write- back bit

0 = Post; add offset after transfer,


1 = Pre ; add offset before transfer

0 = no write-back
1 = write address into base

Register list

Each bit corresponds to a particular


register. For example:
Bit 0 set causes r0 to be transferred.
Bit 0 unset causes r0 not to be transferred.

At least one register must be


transferred as the list cannot be empty.

PSR and force user bit


0 = dont load PSR or force user mode
1 = load PSR or force user mode

8/22/2008

56

EE382N-4 Embedded Systems Architecture

BlockDataTransfer(2)
Baseregisterusedtodeterminewherememoryaccessshould
occur.
4differentaddressingmodesallowincrementanddecrementinclusiveor
exclusiveofthebaseregisterlocation.
Baseregistercanbeoptionallyupdatedfollowingthetransfer(byappending
itwithan!.
Lowestregisternumberisalwaystransferredto/fromlowestmemory
locationaccessed.

Theseinstructionsareveryefficientfor
Savingandrestoringcontext
Forthisusefultoviewmemoryasastack.

Movinglargeblocksofdataaroundmemory
Forthisusefultodirectlyrepresentfunctionalityoftheinstructions.

8/22/2008

57

EE382N-4 Embedded Systems Architecture

Stacks
Astackisanareaofmemorywhichgrowsasnewdatais
pushedontothetopofit,andshrinksasdataispoppedoff
thetop.
Twopointersdefinethecurrentlimitsofthestack.
Abasepointer
usedtopointtothebottomofthestack(thefirstlocation).

Astackpointer
usedtopointthecurrenttopofthestack.
PUSH
{1,2,3}
SP

POP
3
2

SP

1
SP
BASE

8/22/2008

BASE

Result of
pop = 3

1
BASE

58

EE382N-4 Embedded Systems Architecture

StackOperation
Traditionally,astackgrowsdowninmemory,withthelastpushedvalueat
thelowestaddress.TheARMalsosupportsascendingstacks,wherethestack
structuregrowsupthroughmemory.
Thevalueofthestackpointercaneither:
Pointtothelastoccupiedaddress(Fullstack)
andsoneedspredecrementing(iebeforethepush)

Pointtothenextoccupiedaddress(Emptystack)
andsoneedspostdecrementing(ieafterthepush)

Thestacktypetobeusedisgivenbythepostfixtotheinstruction:

STMFD/LDMFD:FullDescendingstack
STMFA/LDMFA:FullAscendingstack.
STMED/LDMED:EmptyDescendingstack
STMEA/LDMEA:EmptyAscendingstack

Note:ARMCompilerwillalwaysuseaFulldescendingstack.

8/22/2008

59

EE382N-4 Embedded Systems Architecture

StackExamples
STMFD sp!,
{r0,r1,r3-r5}

STMFA sp!,
{r0,r1,r3-r5}

STMED sp!,
{r0,r1,r3-r5}

STMEA sp!,
{r0,r1,r3-r5}

0x418
SP

Old SP

Old SP

SP

r5
r4
r3
r1
r0

r5
r4
r3
r1
r0

Old SP

r5
r4
r3
r1
r0

SP

Old SP

r5
r4
r3
r1
r0

0x400

SP

0x3e8

8/22/2008

60

EE382N-4 Embedded Systems Architecture

StacksandSubroutines
Oneuseofstacksistocreatetemporaryregisterworkspaceforsubroutines.
Anyregistersthatareneededcanbepushedontothestackatthestartofthe
subroutineandpoppedoffagainattheendsoastorestorethembefore
returntothecaller:
STMFD sp!,{r0-r12, lr}
........
........
LDMFD sp!,{r0-r12, pc}

; stack all registers


; and the return address
; load all the registers
; and return automatically

SeethechapterontheARMProcedureCallStandardintheSDTReference
Manualforfurtherdetailsofregisterusagewithinsubroutines.
IfthepopinstructionalsohadtheSbitset(using^)thenthetransferofthe
PCwheninaprivilegedmodewouldalsocausetheSPSRtobecopiedintothe
CPSR(seeexceptionhandlingmodule).

8/22/2008

61

EE382N-4 Embedded Systems Architecture

DirectfunctionalityofBlockDataTransfer
WhenLDM/STMarenotbeingusedtoimplementstacks,itis
clearertospecifyexactlywhatfunctionalityoftheinstructionis:
i.e.specifywhethertoincrement/decrementthebasepointer,beforeor
afterthememoryaccess.

Inordertodothis,LDM/STMsupportafurthersyntaxin
additiontothestackone:

STMIA/LDMIA:IncrementAfter
STMIB/LDMIB:IncrementBefore
STMDA/LDMDA:DecrementAfter
STMDB/LDMDB:DecrementBefore

8/22/2008

62

EE382N-4 Embedded Systems Architecture

Example:BlockCopy
Copyablockofmemory,whichisanexactmultipleof12wordslongfromthe
locationpointedtobyr12tothelocationpointedtobyr13.r14pointstothe
endofblocktobecopied.
; r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop

LDMIA

r12!, {r0-r11} ; load 48 bytes

STMIA

r13!, {r0-r11} ; and store them

CMP

r12, r14

BNE

loop

; check for the end

r14

; and loop until done

Thislooptransfers48bytesin31cycles
Over50Mbytes/secat33MHz

8/22/2008

r13
Increasing
Memory

r12

63

EE382N-4 Embedded Systems Architecture

SwapandSwapByteInstructions
Atomicoperationofamemoryreadfollowedbyamemorywrite
whichmovesbyteorwordquantitiesbetweenregistersand
memory.
Syntax:
SWP{<cond>}{B}Rd,Rm,[Rn]

Rn

temp
2

3
Memory

Rm

Rd

ToimplementanactualswapofcontentsmakeRd=Rm.
Thecompilercannotproducethisinstruction.
8/22/2008

64

EE382N-4 Embedded Systems Architecture

SoftwareInterrupt(SWI)
3
1

3
0

2
9

Condition

2
8

2
7

2
6

2
5

2
4

2
3

2
2

2
1

2
0

1
9

1
8

1
7

1
6

1
5

1
4

1
3

1
2

1
1

1
0

SWI NUMBER

InstructionType
SoftwareInterrupt

Ineffect,aSWIisauserdefinedinstruction.
ItcausesanexceptiontraptotheSWIhardwarevector(thus
causingachangetosupervisormode,plustheassociatedstate
saving),thuscausingtheSWIexceptionhandlertobecalled.
Thehandlercanthenexaminethecommentfieldofthe
instructiontodecidewhatoperationhasbeenrequested.
BymakinguseoftheSWImechanism,anoperatingsystemcan
implementasetofprivilegedoperationswhichapplications
runninginusermodecanrequest.
SeeExceptionHandlingModuleforfurtherdetails.
8/22/2008

65

EE382N-4 Embedded Systems Architecture

Backup

8/22/2008

EE382N-4 Embedded Systems Architecture

Assembler:Pseudoops
AREA>chunksofdata($data)orcode($code)
ADR>loadaddressintoaregister
ADRR0,BUFFER
ALIGN>adjustlocationcountertowordboundaryusuallyaftera
storagedirective
END>nomoretoassemble

8/22/2008

67

EE382N-4 Embedded Systems Architecture

Assembler:Pseudoops
DCD>definedwordvaluestoragearea
BOWDCD1024,2055,9051
DCB>definedbytevaluestoragearea
BOBDCB10,12,15
%>zeroedoutbytestoragearea
BLBYTE%30

8/22/2008

68

EE382N-4 Embedded Systems Architecture

Assembler:Pseudoops
IMPORT>nameofroutinetoimportforuseinthisroutine
IMPORT_printf;Cprintroutine
EXPORT>nameofroutinetoexportforuseinotherroutines
EXPORTadd2;add2routine
EQU>symbolreplacement
loopcntEQU5

8/22/2008

69

EE382N-4 Embedded Systems Architecture

AssemblyLineFormat
label <whitespace> instruction <whitespace> ; comment
label: created by programmer, alphanumeric
whitespace: space(s) or tab character(s)
instruction: op-code mnemonic or pseudo-op with required fields
comment: preceded by ; ignored by assembler but useful
to the programmer for documentation
NOTE: All fields are optional.

8/22/2008

70

EE382N-4 Embedded Systems Architecture

Example:Cassignments
C:
x = (a + b) - c;

Assembler:
ADR r4,a

; get address for a

LDR r0,[r4]

; get value of a

ADR r4,b

; get address for b, reusing r4

LDR r1,[r4]

; get value of b

ADD r3,r0,r1

; compute a+b

ADR r4,c

; get address for c

LDR r2,[r4]

; get value of c

SUB r3,r3,r2

; complete computation of x

ADR r4,x

; get address for x

STR r3,[r4]

; store value of x

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

71

EE382N-4 Embedded Systems Architecture

Example:Cassignment
C:
y = a*(b+c);

Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
MUL
ADR
STR

r4,b ; get address for b


r0,[r4] ; get value of b
r4,c ; get address for c
r1,[r4] ; get value of c
r2,r0,r1 ; compute partial result
r4,a ; get address for a
r0,[r4] ; get value of a
r2,r2,r0 ; compute final value for y
r4,y ; get address for y
r2,[r4] ; store y

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

72

EE382N-4 Embedded Systems Architecture

Example:Cassignment
C:
z = (a << 2) |

(b & 15);

Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

73

EE382N-4 Embedded Systems Architecture

Example:ifstatement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;

Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BLE fblock ; if a ><= b, branch to false block

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

74

EE382N-4 Embedded Systems Architecture

ifstatement,contd.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

75

EE382N-4 Embedded Systems Architecture

ifstatement,contd.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

76

EE382N-4 Embedded Systems Architecture

Example:Conditionalinstructionimplementation
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for y
STRLT r0,[r4] ; store y

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

77

EE382N-4 Embedded Systems Architecture

Conditionalinstructionimplementation,contd.
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

78

EE382N-4 Embedded Systems Architecture

Example:switchstatement
C:
switch (test) { case 0: break; case 1: }

Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

79

EE382N-4 Embedded Systems Architecture

Example:FIRfilter
C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];

Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

80

EE382N-4 Embedded Systems Architecture

FIRfilter,cont.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue

2008WayneWolf

8/22/2008

ComputersasComponents2nd ed.

81

EE382N-4 Embedded Systems Architecture

ARMInstructionSetSummary(1/4)

82

EE382N-4 Embedded Systems Architecture

ARMInstructionSetSummary(2/4)

83

EE382N-4 Embedded Systems Architecture

ARMInstructionSetSummary(3/4)

84

EE382N-4 Embedded Systems Architecture

ARMInstructionSetSummary(4/4)

85

You might also like