Professional Documents
Culture Documents
The ARM Instruction Set Architecture: With Help From Our Good Friends at ARM
The ARM Instruction Set Architecture: With Help From Our Good Friends at ARM
TheARMInstructionSetArchitecture
MarkMcDermott
WithhelpfromourgoodfriendsatARM
Fall2008
8/22/2008
MainfeaturesoftheARMInstructionSet
Allinstructionsare32bitslong.
Mostinstructionsexecuteinasinglecycle.
Mostinstructionscanbeconditionallyexecuted.
Aload/storearchitecture
Dataprocessinginstructionsactonlyonregisters
Threeoperandformat
CombinedALUandshifterforhighspeedbitmanipulation
Specificmemoryaccessinstructionswithpowerfulautoindexingaddressing
modes.
32bitand8bitdatatypes
andalso16bitdatatypesonARMArchitecturev4.
Flexiblemultipleregisterloadandstoreinstructions
Instructionsetextensionviacoprocessors
Verydense16bitcompressedinstructionset(Thumb)
8/22/2008
Coprocessors
Upto16 coprocessorscanbedefined
ExpandstheARMinstructionset
Eachcoprocessorcanhaveupto16privateregistersofanyreasonablesize
Loadstorearchitecture
Thumb
Thumbisa16bitinstructionset
OptimizedforcodedensityfromCcode
Improvedperformanceformnarrowmemory
SubsetofthefunctionalityoftheARMinstructionset
Corehastwoexecutionstates ARMandThumb
SwitchbetweenthemusingBXinstruction
Thumbhascharacteristicfeatures:
MostThumbinstructionareexecutedunconditionally
ManyThumbdataprocessinstructionusea2addressformat
ThumbinstructionformatsarelessregularthanARMinstructionformats,as
aresultofthedenseencoding.
ProcessorModes
TheARMhassixoperatingmodes:
User(unprivilegedmodeunderwhichmosttasksrun)
FIQ(enteredwhenahighpriority(fast)interruptisraised)
IRQ(enteredwhenalowpriority(normal)interruptisraised)
Supervisor(enteredonresetandwhenaSoftwareInterruptinstructionis
executed)
Abort(usedtohandlememoryaccessviolations)
Undef(usedtohandleundefinedinstructions)
ARMArchitectureVersion4addsaseventhmode:
System(privilegedmodeusingthesameregistersasusermode)
8/22/2008
TheRegisters
ARMhas37registersintotal,allofwhichare32bitslong.
1dedicatedprogramcounter
1dedicatedcurrentprogramstatusregister
5dedicatedsavedprogramstatusregisters
30generalpurposeregisters
Howeverthesearearrangedintoseveralbanks,withthe
accessiblebankbeinggovernedbytheprocessormode.Each
modecanaccess
aparticularsetofr0r12registers
aparticularr13(thestackpointer)andr14(linkregister)
r15(theprogramcounter)
cpsr(thecurrentprogramstatusregister)
Andprivilegedmodescanalsoaccess
aparticularspsr(savedprogramstatusregister)
8/22/2008
TheARMRegisterSet
Current Visible Registers
Abort
Mode
Undef
SVC
Mode
IRQ
FIQ
Mode
User
Mode
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r15 (pc)
cpsr
spsr
8/22/2008
FIQ
IRQ
SVC
Undef
Abort
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
spsr
spsr
spsr
spsr
spsr
RegisterOrganizationSummary
User
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r15 (pc)
FIQ
User
mode
r0-r7,
r15,
and
cpsr
r8
r9
IRQ
User
mode
r0-r12,
r15,
and
cpsr
SVC
User
mode
r0-r12,
r15,
and
cpsr
Undef
User
mode
r0-r12,
r15,
and
cpsr
Abort
User
mode
r0-r12,
r15,
and
cpsr
Thumb state
Low registers
Thumb state
High registers
r10
r11
r12
r13 (sp)
r13 (sp)
r13 (sp)
r13 (sp)
r13 (sp)
r14 (lr)
r14 (lr)
r14 (lr)
r14 (lr)
r14 (lr)
spsr
spsr
spsr
spsr
spsr
cpsr
AccessingRegistersusingARMInstructions
Nobreakdownofcurrentlyaccessibleregisters.
Allinstructionscanaccessr0r14directly.
MostinstructionsalsoallowuseofthePC.
SpecificinstructionstoallowaccesstoCPSRandSPSR.
Note:Wheninaprivilegedmode,itisalsopossibletoloadstore
the(bankedout)usermoderegisterstoorfrommemory.
8/22/2008
TheProgramStatusRegisters(CPSRandSPSRs)
31
28
N Z CV
I F T
Mode
CopiesoftheALUstatusflags(latchedifthe
instructionhasthe"S"bitset).
*ConditionCodeFlags
N=NegativeresultfromALUflag.
Z=ZeroresultfromALUflag.
C=ALUoperationCarriedout
V=ALUoperationoVerflowed
*ModeBits
M[4:0]definetheprocessormode.
8/22/2008
*InterruptDisablebits.
I =1,disablestheIRQ.
F =1,disablestheFIQ.
*TBit(Architecturev4Tonly)
T=0,ProcessorinARMstate
T=1,ProcessorinThumbstate
10
ConditionFlags
LogicalInstruction
ArithmeticInstruction
Negative
(N=1)
Nomeaning
Bit31oftheresulthasbeenset
Indicatesanegativenumberin
signedoperations
Zero
(Z=1)
Resultisallzeroes
Resultofoperationwaszero
Carry
(C=1)
AfterShiftoperation
1wasleftincarryflag
Resultwasgreaterthan32bits
oVerflow
(V=1)
Nomeaning
Resultwasgreaterthan31bits
Indicatesapossiblecorruptionof
thesignbitinsigned
numbers
Flag
8/22/2008
11
TheProgramCounter(R15)
WhentheprocessorisexecutinginARMstate:
Allinstructionsare32bitsinlength
Allinstructionsmustbewordaligned
ThereforethePCvalueisstoredinbits[31:2]withbits[1:0]equaltozero(as
instructioncannotbehalfwordorbytealigned).
R14isusedasthesubroutinelinkregister(LR)andstoresthe
returnaddresswhenBranchwithLinkoperationsareperformed,
calculatedfromthePC.
Thustoreturnfromalinkedbranch:
MOVr15,r14
or
MOVpc,lr
8/22/2008
12
ExceptionHandlingandtheVectorTable
Whenanexceptionoccurs,thecore:
CopiesCPSRintoSPSR_<mode>
SetsappropriateCPSRbits
IfcoreimplementsARMArchitecture4Tandis
currentlyinThumbstate,then
ARMstateisentered.
Modefieldbits
Interruptdisableflagsifappropriate.
Mapsinappropriatebankedregisters
StoresthereturnaddressinLR_<mode>
SetsPCtovectoraddress
Toreturn,exceptionhandlerneedsto:
RestoreCPSRfromSPSR_<mode>
RestorePCfromLR_<mode>
8/22/2008
13
TheOriginalInstructionPipeline
TheARMusesapipelineinordertoincreasethespeedofthe
flowofinstructionstotheprocessor.
Allowsseveraloperationstobeundertakensimultaneously,ratherthan
serially.
PC
FETCH
PC - 4
DECODE
PC - 8
EXECUTE
Ratherthanpointingtotheinstructionbeingexecuted,thePC
pointstotheinstructionbeingfetched.
8/22/2008
14
PipelinechangesforARM9TDMI
ARM7TDMI
Instruction
Fetch
ThumbARM
decompress
FETCH
ARM decode
Reg
Shift
Read
Reg
Write
ALU
Reg Select
DECODE
EXECUTE
ARM9TDMI
Instruction
Fetch
FETCH
ARM or Thumb
Inst Decode
Reg
Reg
Decode
Read
DECODE
Shift + ALU
EXECUTE
Memory
Access
MEMORY
Reg
Write
WRITE
PipelinechangesforARM10vs.ARM11Pipelines
ARM10
Branch
Prediction
Instruction
Fetch
FETCH
ARM or
Thumb
Instruction
Decode
Reg Read
Memory
Access
EXECUTE
MEMORY
ARM11
Fetch
1
Fetch
2
Decode
Issue
Shift
ALU
Saturate
MAC
1
MAC
2
MAC
3
Data
Address Cache
1
Reg
Write
Multiply
Add
Multiply
DECODE
ISSUE
Shift + ALU
Data
Cache
2
Write
back
WRITE
ARMInstructionSetFormat
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
Condition
Condition
Condition
Condition
Condition
Condition
2
2
2
1
OPCODE
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0
InstructionType
Rn
Rs
0 A
Rd
Rn
Rs
Rm
Multiply
1 U A
RdHIGH
Rd LOW
Rs
Rm
LongMultiply
Rn
Rd
Rm
Swap
P U B W L
Rn
Rd
P U B W L
Rn
Condition
P U 1 W L
Rn
Rd
Condition
P U 0 W L
Rn
Rd
Condition
Condition
Condition
P U N W L
Rn
CRd
CPNum
Condition
CRn
CRd
CPNum
OP2
CRm
COPROCESSOR DATAOP
CRn
Rd
CPNum
OP2
CRm
COPROCESSOR REGXFER
Condition
Condition
8/22/2008
Dataprocessing
OFFSET
Load/Store Byte/Word
REGISTERLIST
OFFSET1
0
Load/Store Multiple
S H 1
OFFSET2
S H 1
Rm
BRANCH OFFSET
0
Op1
OP1
OPERAND2
SWI NUMBER
Branch
1
Rn
OFFSET
Branch Exchange
COPROCESSOR DATAXFER
SoftwareInterrupt
17
ConditionalExecution
Mostinstructionsetsonlyallowbranchestobeexecuted
conditionally.
Howeverbyreusingtheconditionevaluationhardware,ARM
effectivelyincreasesnumberofinstructions.
AllinstructionscontainaconditionfieldwhichdetermineswhethertheCPU
willexecutethem.
Nonexecutedinstructionsconsume1cycle.
CantcollapsetheinstructionlikeaNOP.Stillhavetocompletecyclesoastoallow
fetchinganddecodingofthefollowinginstructions.
Thisremovestheneedformanybranches,whichstallthe
pipeline(3cyclestorefill).
Allowsverydenseinlinecode,withoutbranches.
TheTimepenaltyofnotexecutingseveralconditionalinstructionsis
frequentlylessthanoverheadofthebranch
orsubroutinecallthatwouldotherwisebeneeded.
8/22/2008
18
TheConditionField
3
1
3
0
2
9
Condition
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
OPCODE
2
0
1
9
1
8
1
7
1
6
1
5
Rn
1
4
1
3
Rs
8/22/2008
1
2
1
1
1
0
OPERAND2
InstructionType
Dataprocessing
19
UsingandupdatingtheConditionField
Toexecuteaninstructionconditionally,simplypostfixitwiththeappropriate
condition:
Forexampleanaddinstructiontakestheform:
ADDr0,r1,r2 ;r0=r1+r2(ADDAL)
Toexecutethisonlyifthezeroflagisset:
ADDEQr0,r1,r2
;Ifzeroflagsetthen
;...r0=r1+r2
Bydefault,dataprocessingoperationsdonotaffecttheconditionflags(apart
fromthecomparisonswherethisistheonlyeffect).Tocausethecondition
flagstobeupdated,theSbitoftheinstructionneedstobesetbypostfixing
theinstruction(andanyconditioncode)withanS.
Forexampletoaddtwonumbersandsettheconditionflags:
ADDSr0,r1,r2
andsetflags
8/22/2008
;r0=r1+r2
;...
20
ConditionalExecutionandFlags
ARMinstructionscanbemadetoexecuteconditionallybypostfixingthemwiththe
appropriateconditioncodefield.
Thisimprovescodedensityand performancebyreducingthenumberofforward
branchinstructions.
CMP
BEQ
ADD
skip
r3,#0
skip
r0,r1,r2
CMP
r3,#0
ADDNE r0,r1,r2
Bydefault,dataprocessinginstructionsdonotaffecttheconditioncodeflagsbutthe
flagscanbeoptionallysetbyusingS.CMPdoesnotneedS.
loop
8/22/2008
21
Branchinstructions(1)
Branch:
BranchwithLink:
3
1
3
0
2
9
Condition
2
8
B{<cond>}label
BL{<cond>}sub_routine_label
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0
BRANCH OFFSET
Linkbit
0=Branch
1=Branchwithlink
Conditionfield
Theoffsetforbranchinstructionsiscalculatedbytheassembler:
Bytakingthedifferencebetweenthebranchinstructionandthetargetaddress
minus8(toallowforthepipeline).
Thisgivesa26bitoffsetwhichisrightshifted2bits(asthebottomtwobitsare
alwayszeroasinstructionsareword aligned)andstoredintotheinstruction
encoding.
Thisgivesarangeof 32Mbytes.
8/22/2008
22
Branchinstructions(2)
Whenexecutingtheinstruction,theprocessor:
shiftstheoffsetlefttwobits,signextendsitto32bits,andaddsittoPC.
ExecutionthencontinuesfromthenewPC,oncethepipelinehas
beenrefilled.
The"Branchwithlink"instructionimplementsasubroutinecall
bywritingPC4intotheLRofthecurrentbank.
i.e.theaddressofthenextinstructionfollowingthebranchwithlink
(allowingforthepipeline).
Toreturnfromsubroutine,simplyneedtorestorethePCfrom
theLR:
MOVpc,lr
Again,pipelinehastorefillbeforeexecutioncontinues.
8/22/2008
23
Branchinstructions(3)
The"Branch"instructiondoesnotaffectLR.
Note:Architecture4ToffersafurtherARMbranchinstruction,BX
SeeThumbInstructionSetModulefordetails.
BL<subroutine>
StoresreturnaddressinLR
ReturningimplementedbyrestoringthePCfromLR
Fornonleaffunctions,LRwillhavetobestacked
func1
:
:
BLfunc1
:
:
8/22/2008
STMFDsp!,{regs,lr}
:
BLfunc2
:
LDMFDsp!,{regs,pc}
func2
:
:
:
:
:
MOVpc,lr
24
ConditionalBranches
Branch
B
BAL
BEQ
BNE
BPL
BMI
BCC
BLO
BCS
BHS
BVC
BVS
BGT
BGE
BLT
BLE
BHI
BLS
8/22/2008
Interpretation
Unconditional
Always
Equal
Notequal
Plus
Minus
Carryclear
Lower
Carryset
Higherorsame
Overflowclear
Overflowset
Greaterthan
Greaterorequal
Lessthan
Lessorequal
Higher
Lowerorsame
Normaluses
Alwaystakethisbranch
Alwaystakethisbranch
Comparisonequalorzeroresult
Comparisonnotequalornonzeroresult
Resultpositiveorzero
Resultminusornegative
Arithmeticoperationdidnotgivecarryout
Unsignedcomparisongavelower
Arithmeticoperationgavecarryout
Unsignedcomparisongavehigherorsame
Signedintegeroperation;nooverflowoccurred
Signedintegeroperation;overflowoccurred
Signedintegercomparisongavegreaterthan
Signedintegercomparisongavegreaterorequal
Signedintegercomparisongavelessthan
Signedintegercomparisongavelessthanorequal
Unsignedcomparisongavehigher
Unsignedcomparisongavelowerorsame
25
DataprocessingInstructions
LargestfamilyofARMinstructions,allsharingthesame
instructionformat.
Contains:
Arithmeticoperations
Comparisons(noresults justsetconditioncodes)
Logicaloperations
Datamovementbetweenregisters
Remember,thisisaload/storearchitecture
Theseinstructiononlyworkonregisters,NOTmemory.
Theyeachperformaspecificoperationononeortwooperands.
Firstoperandalwaysaregister Rn
SecondoperandsenttotheALUviabarrelshifter.
Wewillexaminethebarrelshiftershortly.
8/22/2008
26
ArithmeticOperations
Operationsare:
ADD
ADC
SUB
SBC
RSB
RSC
operand1+operand2
operand1+operand2+carry
operand1 operand2
operand1 operand2+carry1
operand2 operand1
operand2 operand1+carry 1
;Add
;Addwithcarry
;Subtract
;Subtractwithcarry
;Reversesubtract
;Reversesubtractwithcarry
Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2
Examples
ADDr0,r1,r2
SUBGTr3,r3,#1
RSBLESr4,r5,#5
8/22/2008
27
Comparisons
Theonlyeffectofthecomparisonsistoupdatethecondition
flags.ThusnoneedtosetSbit.
Operationsare:
CMP
CMN
TST
TEQ
operand1 operand2
operand1+operand2
operand1ANDoperand2
operand1EORoperand2
;Compare
;Comparenegative
;Test
;Testequivalence
Syntax:
<Operation>{<cond>}Rn,Operand2
Examples:
CMP
TSTEQ
8/22/2008
r0,r1
r2,#5
28
LogicalOperations
Operationsare:
AND operand1ANDoperand2
EOR operand1EORoperand2
ORR operand1ORoperand2
ORNoperand1NORoperand2
BIC operand1ANDNOToperand2[iebitclear]
Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2
Examples:
AND r0,r1,r2
BICEQ r2,r3,#7
EORS r1,r3,r0
8/22/2008
29
DataMovement
Operationsare:
MOV operand2
MVN NOToperand2
Notethatthesemakenouseofoperand1.
Syntax:
<Operation>{<cond>}{S}Rd,Operand2
Examples:
MOV
MOVS
MVNEQ
8/22/2008
r0,r1
r2,#10
r1,#0
30
TheBarrelShifter
TheARMdoesnthaveactualshiftinstructions.
Insteadithasabarrelshifterwhichprovidesamechanismto
carryoutshiftsaspartofotherinstructions.
Sowhatoperationsdoesthebarrelshiftersupport?
8/22/2008
31
BarrelShifter LeftShift
Shiftsleftbythespecifiedamount(multipliesbypowersoftwo)
e.g.
LSL#5=>multiplyby32
LogicalShiftLeft(LSL)
CF
8/22/2008
Destination
32
BarrelShifter RightShifts
LogicalShiftRight(LSR)
Shiftsrightbythespecified
amount(dividesbypowersof
two)e.g.
LogicalShiftRight
...0
Destination
CF
zeroshiftedin
LSR#5=divideby32
ArithmeticShiftRight
ArithmeticShiftRight(ASR)
Shiftsright(dividesbypowersof
two)andpreservesthesignbit,
for2'scomplementoperations.
e.g.
Destination
CF
Signbitshiftedin
ASR#5=divideby32
8/22/2008
33
BarrelShifter Rotations
RotateRight(ROR)
RotateRight
SimilartoanASRbutthebits
wraparoundastheyleavethe
LSBandappearastheMSB.
Destination
CF
e.g.ROR#5
Notethelastbitrotatedisalso
usedastheCarryOut.
RotateRightExtended(RRX)
ThisoperationusestheCPSRC
flagasa33rdbit.
Rotatesrightby1bit.Encoded
asROR#0
8/22/2008
RotateRightthroughCarry
Destination
CF
34
UsingtheBarrelShifter:TheSecondOperand
Operand
1
Operand
2
Barrel
Shifter
ALU
Register,optionallywithshift
operationapplied.
Shiftvaluecanbeeitherbe:
5bitunsignedinteger
Specifiedinbottombyteof
anotherregister.
* Immediatevalue
8bitnumber
Canberotatedright
throughanevennumber
ofpositions.
Assemblerwillcalculate
rotateforyoufrom
constant.
Result
8/22/2008
35
SecondOperand:ShiftedRegister
Theamountbywhichtheregisteristobeshiftediscontainedin
either:
theimmediate5bitfieldintheinstruction
NOOVERHEAD
Shiftisdoneforfree executesinsinglecycle.
thebottombyteofaregister(notPC)
Thentakesextracycletoexecute
ARMdoesnthaveenoughreadportstoread3registersatonce.
Thensameasonotherprocessorswhereshiftis
separateinstruction.
Ifnoshiftisspecifiedthenadefaultshiftisapplied:LSL#0
i.e.barrelshifterhasnoeffectonvalueinregister.
8/22/2008
36
SecondOperand:UsingaShiftedRegister
Usingamultiplicationinstructiontomultiplybyaconstantmeansfirstloading
theconstantintoaregisterandthenwaitinganumberofinternalcyclesfor
theinstructiontocomplete.
Amoreoptimumsolutioncanoftenbefoundbyusingsomecombinationof
MOVs,ADDs,SUBsandRSBswithshifts.
Multiplicationsbyaconstantequaltoa((powerof2) 1)canbedoneinonecycle.
MOVR2,R0,LSL#2
;ShiftR0leftby2,writetoR2,(R2=R0x4)
ADDR9,R5,R5,LSL#3 ;R9=R5+R5x8orR9=R5x9
RSBR9,R5,R5,LSL#3 ;R9=R5x8 R5orR9=R5x7
SUBR10,R9,R8,LSR#4;R10=R9 R8/16
MOVR12,R4,RORR3 ;R12=R4rotatedrightbyvalueofR3
8/22/2008
37
SecondOperand:ImmediateValue(1)
Thereisnosingleinstructionwhichwillloada32bitimmediateconstantinto
aregisterwithoutperformingadataloadfrommemory.
AllARMinstructionsare32bitslong
ARMinstructionsdonotusetheinstructionstreamasdata.
Thedataprocessinginstructionformathas12bitsavailableforoperand2
Ifuseddirectlythiswouldonlygivearangeof4096.
Insteaditisusedtostore8bitconstants,givingarangeof0 255.
These8bitscanthenberotatedrightthroughanevennumberofpositions(ie
RORsby0,2,4,..30).
Thisgivesamuchlargerrangeofconstantsthatcanbedirectlyloaded,thoughsome
constantswillstillneedtobeloadedfrommemory.
8/22/2008
38
SecondOperand:ImmediateValue(2)
Thisgivesus:
0 255
256,260,264,..,1020
1024,1040,1056,..,4080
4096,4160,4224,..,16320
[0 0xff]
[0x1000x3fc,step4,0x400xffror 30]
[0x4000xff0,step16,0x400xffror 28]
[0x10000x3fc0,step64,0x400xffror 26]
Thesecanbeloadedusing,forexample:
MOVr0,#0x40,26
;=>MOVr0,#0x1000(ie4096)
Tomakethiseasier,theassemblerwillconverttothisformforusifsimply
giventherequiredconstant:
MOVr0,#4096
;=>MOVr0,#0x1000(ie0x40ror 26)
ThebitwisecomplementscanalsobeformedusingMVN:
MOVr0,#0xFFFFFFFF
;assemblestoMVNr0,#0
Iftherequiredconstantcannotbegenerated,anerrorwill
bereported.
8/22/2008
39
Loadingfull32bitconstants
AlthoughtheMOV/MVNmechanismwillloadalargerangeofconstantsintoa
register,sometimesthismechanismwillnotgeneratetherequiredconstant.
Therefore,theassembleralsoprovidesamethodwhichwillloadANY32bit
constant:
LDRrd,=numericconstant
IftheconstantcanbeconstructedusingeitheraMOVorMVNthenthiswillbe
theinstructionactuallygenerated.
Otherwise,theassemblerwillproduceanLDRinstructionwithaPCrelative
addresstoreadtheconstantfromaliteralpool.
LDRr0,=0x42
LDRr0,=0x55555555
;generatesMOVr0,#0x42
;generateLDRr0,[pc,offsettolitpool]
:
:
DCD0x55555555
Asthismechanismwillalwaysgeneratethebestinstructionforagivencase,it
istherecommendedwayofloadingconstants.
8/22/2008
40
MultiplicationInstructions
TheBasicARMprovidestwomultiplicationinstructions.
Multiply
MUL{<cond>}{S}Rd,Rm,Rs
MultiplyAccumulate
;Rd=Rm*Rs
doesadditionforfree
MLA{<cond>}{S}Rd,Rm,Rs,Rn
;Rd=(Rm*Rs)+Rn
Restrictionsonuse:
RdandRmcannotbethesameregister
CanbeavoidedbyswappingRmandRsaround.Thisworksbecausemultiplication
iscommutative.
CannotusePC.
Thesewillbepickedupbytheassemblerifoverlooked.
Operandscanbeconsideredsignedorunsigned
Uptousertointerpretcorrectly.
8/22/2008
41
MultiplicationImplementation
TheARMmakesuseofBoothsAlgorithmtoperforminteger
multiplication.
OnnonMARMsthisoperateson2bitsofRsatatime.
Foreachpairofbitsthistakes1cycle(plus1cycletostartwith).
Howeverwhentherearenomore1sleftinRs,themultiplicationwillearly
terminate.
Example:Multiply18and1:Rd=Rm*Rs
Rm
18
Rs
Rs
Rm
17cycles
4cycles
Note:Compilerdoesnotuseearlyterminationcriteriato
decideonwhichordertoplaceoperands.
8/22/2008
42
ExtendedMultiplyInstructions
MvariantsofARMcorescontainextendedmultiplication
hardware.Thisprovidesthreeenhancements:
An8bitBoothsAlgorithmisused
Multiplicationiscarriedoutfaster(maximumforstandardinstructionsisnow5
cycles).
Earlyterminationmethodimprovedsothatnowcompletesmultiplication
whenallremainingbitsetscontain
allzeroes(aswithnonMARMs),or
allones.
Thusthepreviousexamplewouldearlyterminatein2cyclesinboth
cases.
64bitresultscannowbeproducedfromtwo32bitoperands
Higheraccuracy.
Pairofregistersusedtostoreresult.
8/22/2008
43
MultiplyLong&MultiplyAccumulateLong
Instructionsare
MULLwhichgivesRdHi,RdLo:=Rm*Rs
MLALwhichgivesRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
Howeverthefull64bitoftheresultnowmatter(lowerprecision
multiplyinstructionssimplythrowstop32bitsaway)
Needtospecifywhetheroperandsaresignedorunsigned
Thereforesyntaxofnewinstructionsare:
UMULL{<cond>}{S}RdLo,RdHi,Rm,Rs
UMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs
SMULL{<cond>}{S}RdLo,RdHi,Rm,Rs
SMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs
Notgeneratedbythecompiler.
Warning:UnpredictableonnonMARMs.
8/22/2008
44
Load/StoreInstructions
TheARMisaLoad/StoreArchitecture:
Doesnotsupportmemorytomemorydataprocessingoperations.
Mustmovedatavaluesintoregistersbeforeusingthem.
Thismightsoundinefficient,butinpracticeitisnt:
Loaddatavaluesfrommemoryintoregisters.
Processdatainregistersusinganumberofdataprocessinginstructions
whicharenotsloweddownbymemoryaccess.
Storeresultsfromregistersouttomemory.
TheARMhasthreesetsofinstructionswhichinteractwithmain
memory.Theseare:
Singleregisterdatatransfer(LDR/STR).
Blockdatatransfer(LDM/STM).
SingleDataSwap(SWP).
8/22/2008
45
Singleregisterdatatransfer
Thebasicloadandstoreinstructionsare:
LoadandStoreWordorByte
LDR/STR/LDRB/STRB
ARMArchitectureVersion4alsoaddssupportforHalfwordsand
signeddata.
LoadandStoreHalfword
LDRH/STRH
LoadSignedByteorHalfword loadvalueandsignextenditto32bits.
LDRSB/LDRSH
Alloftheseinstructionscanbeconditionallyexecutedby
insertingtheappropriateconditioncodeafterSTR/LDR.
e.g.LDREQB
Syntax:
<LDR|STR>{<cond>}{<size>}Rd,<address>
8/22/2008
46
LoadandStoreWordorByte:BaseRegister
Thememorylocationtobeaccessedisheldinabaseregister
STRr0,[r1]
;Storecontentsofr0tolocationpointedto
;bycontentsofr1.
;Loadr2withcontentsofmemorylocation
;pointedtobycontentsofr1.
LDRr2,[r1]
Source
Register
forSTR
Base
Register
8/22/2008
Memory
r0
0x5
r1
0x200
r2
0x200
0x5
0x5
Destination
Register
forLDR
47
Load/StoreWordorByte:OffsetsfromtheBaseRegister
Aswellasaccessingtheactuallocationcontainedinthebase
register,theseinstructionscanaccessalocationoffsetfromthe
baseregisterpointer.
Thisoffsetcanbe
Anunsigned12bitimmediatevalue(ie0 4095bytes).
Aregister,optionallyshiftedbyanimmediatevalue
Thiscanbeeitheraddedorsubtractedfromthebaseregister:
Prefixtheoffsetvalueorregisterwith+(default)or.
Thisoffsetcanbeapplied:
beforethetransferismade:Preindexedaddressing
optionallyautoincrementingthebaseregister,bypostfixingtheinstructionwith
an!.
afterthetransferismade:Postindexedaddressing
causingthebaseregistertobeautoincremented.
8/22/2008
48
Load/StoreWordorByte:PreindexedAddressing
Example:STRr0,[r1,#12]
Memory
r0
0x5
Source
Register
forSTR
Offset
12
Base
Register
0x20c
0x5
r1
0x200
0x200
Tostoretolocation0x1f4insteaduse:STRr0,[r1,#12]
Toautoincrementbasepointerto0x20cuse:STRr0,[r1,#12]!
Ifr2contains3,access0x20cbymultiplyingthisby4:
STRr0,[r1,r2,LSL#2]
8/22/2008
49
LoadandStoreWordorByte:PostindexedAddressing
Example:STRr0,[r1],#12
Memory
Updated
Base
Register
Original
Base
Register
r1
Offset
0x20c
12
r1
r0
0x5
0x20c
0x200
Source
Register
for STR
0x5
0x200
Toautoincrementthebaseregistertolocation0x1f4insteaduse:
STRr0,[r1],#12
Ifr2contains3,autoincrementbaseregisterto0x20cbymultiplyingthisby
4:
STRr0,[r1],r2,LSL#2
8/22/2008
50
LoadandStoreswithUserModePrivilege
Whenusingpostindexedaddressing,thereisafurtherformof
Load/StoreWord/Byte:
<LDR|STR>{<cond>}{B}TRd,<post_indexed_address>
Whenusedinaprivilegedmode,thisdoestheload/storewith
usermodeprivilege.
Normallyusedbyanexceptionhandlerthatisemulatingamemoryaccess
instructionthatwouldnormallyexecuteinusermode.
8/22/2008
51
ExampleUsageofAddressingModes
Imagineanarray,thefirstelementofwhichispointedtobythecontentsofr0.
Ifwewanttoaccessaparticularelement,
thenwecanusepreindexedaddressing:
element
Memory
Offset
12
r1iselementwewant.
LDRr2,[r0,r1,LSL#2]
Pointer to
start of array
Ifwewanttostepthroughevery
1
elementofthearray,forinstance
0
r0
toproducesumofelementsinthe
array,thenwecanusepostindexedaddressingwithinaloop:
4
0
r1isaddressofcurrentelement(initiallyequaltor0).
LDRr2,[r1],#4
Useafurtherregistertostoretheaddressoffinalelement,
sothattheloopcanbecorrectlyterminated.
8/22/2008
52
OffsetsforHalfwordandSignedHalfword/ByteAccess
TheLoadandStoreHalfwordandLoadSignedByteorHalfword
instructionscanmakeuseofpre andpostindexedaddressingin
muchthesamewayasthebasicloadandstoreinstructions.
Howevertheactualoffsetformatsaremoreconstrained:
Theimmediatevalueislimitedto8bits(ratherthan12bits)givinganoffset
of0255bytes.
Theregisterformcannothaveashiftappliedtoit.
8/22/2008
53
Effectofendianess
TheARMcanbesetuptoaccessitsdataineitherlittleorbig
endianformat.
Littleendian:
Leastsignificantbyteofawordisstoredinbits07ofanaddressedword.
Bigendian:
Leastsignificantbyteofawordisstoredinbits2431ofanaddressedword.
Thishasnorealrelevanceunlessdataisstoredaswordsandthen
accessedinsmallersizedquantities(halfwords orbytes).
Whichbyte/halfwordisaccessedwilldependontheendianess ofthe
systeminvolved.
8/22/2008
54
YAEndianess Example
r0 = 0x11223344
31
24 23
11
22
16 15
87
33
44
31
r1 = 0x100
24 23
11
22
16 15
87
33
Memory
44
Little-endian
24 23
44
16 15
33
87
22
11
r1 = 0x100
Big-endian
24 23
00
00
16 15
87
00
r2 = 0x44
8/22/2008
31
44
31
24 23
00
16 15
00
87
00
11
r2 = 0x11
55
BlockDataTransfer(1)
TheLoadandStoreMultipleinstructions(LDM/STM)allow
betweeen1and16registerstobetransferredtoorfrom
memory.
Thetransferredregisterscanbeeither:
Anysubsetofthecurrentbankofregisters(default).
Anysubsetoftheusermodebankofregisterswheninapriviledgedmode
(postfixinstructionwitha^).
31
28 27
Cond
24 23 22 21 20 19
1 0 0 P U S W L
Condition field
16 15
Rn
Base register
Up/Down bit
Load/Store bit
0 = Store to memory
1 = Load from memory
0 = no write-back
1 = write address into base
Register list
8/22/2008
56
BlockDataTransfer(2)
Baseregisterusedtodeterminewherememoryaccessshould
occur.
4differentaddressingmodesallowincrementanddecrementinclusiveor
exclusiveofthebaseregisterlocation.
Baseregistercanbeoptionallyupdatedfollowingthetransfer(byappending
itwithan!.
Lowestregisternumberisalwaystransferredto/fromlowestmemory
locationaccessed.
Theseinstructionsareveryefficientfor
Savingandrestoringcontext
Forthisusefultoviewmemoryasastack.
Movinglargeblocksofdataaroundmemory
Forthisusefultodirectlyrepresentfunctionalityoftheinstructions.
8/22/2008
57
Stacks
Astackisanareaofmemorywhichgrowsasnewdatais
pushedontothetopofit,andshrinksasdataispoppedoff
thetop.
Twopointersdefinethecurrentlimitsofthestack.
Abasepointer
usedtopointtothebottomofthestack(thefirstlocation).
Astackpointer
usedtopointthecurrenttopofthestack.
PUSH
{1,2,3}
SP
POP
3
2
SP
1
SP
BASE
8/22/2008
BASE
Result of
pop = 3
1
BASE
58
StackOperation
Traditionally,astackgrowsdowninmemory,withthelastpushedvalueat
thelowestaddress.TheARMalsosupportsascendingstacks,wherethestack
structuregrowsupthroughmemory.
Thevalueofthestackpointercaneither:
Pointtothelastoccupiedaddress(Fullstack)
andsoneedspredecrementing(iebeforethepush)
Pointtothenextoccupiedaddress(Emptystack)
andsoneedspostdecrementing(ieafterthepush)
Thestacktypetobeusedisgivenbythepostfixtotheinstruction:
STMFD/LDMFD:FullDescendingstack
STMFA/LDMFA:FullAscendingstack.
STMED/LDMED:EmptyDescendingstack
STMEA/LDMEA:EmptyAscendingstack
Note:ARMCompilerwillalwaysuseaFulldescendingstack.
8/22/2008
59
StackExamples
STMFD sp!,
{r0,r1,r3-r5}
STMFA sp!,
{r0,r1,r3-r5}
STMED sp!,
{r0,r1,r3-r5}
STMEA sp!,
{r0,r1,r3-r5}
0x418
SP
Old SP
Old SP
SP
r5
r4
r3
r1
r0
r5
r4
r3
r1
r0
Old SP
r5
r4
r3
r1
r0
SP
Old SP
r5
r4
r3
r1
r0
0x400
SP
0x3e8
8/22/2008
60
StacksandSubroutines
Oneuseofstacksistocreatetemporaryregisterworkspaceforsubroutines.
Anyregistersthatareneededcanbepushedontothestackatthestartofthe
subroutineandpoppedoffagainattheendsoastorestorethembefore
returntothecaller:
STMFD sp!,{r0-r12, lr}
........
........
LDMFD sp!,{r0-r12, pc}
SeethechapterontheARMProcedureCallStandardintheSDTReference
Manualforfurtherdetailsofregisterusagewithinsubroutines.
IfthepopinstructionalsohadtheSbitset(using^)thenthetransferofthe
PCwheninaprivilegedmodewouldalsocausetheSPSRtobecopiedintothe
CPSR(seeexceptionhandlingmodule).
8/22/2008
61
DirectfunctionalityofBlockDataTransfer
WhenLDM/STMarenotbeingusedtoimplementstacks,itis
clearertospecifyexactlywhatfunctionalityoftheinstructionis:
i.e.specifywhethertoincrement/decrementthebasepointer,beforeor
afterthememoryaccess.
Inordertodothis,LDM/STMsupportafurthersyntaxin
additiontothestackone:
STMIA/LDMIA:IncrementAfter
STMIB/LDMIB:IncrementBefore
STMDA/LDMDA:DecrementAfter
STMDB/LDMDB:DecrementBefore
8/22/2008
62
Example:BlockCopy
Copyablockofmemory,whichisanexactmultipleof12wordslongfromthe
locationpointedtobyr12tothelocationpointedtobyr13.r14pointstothe
endofblocktobecopied.
; r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop
LDMIA
STMIA
CMP
r12, r14
BNE
loop
r14
Thislooptransfers48bytesin31cycles
Over50Mbytes/secat33MHz
8/22/2008
r13
Increasing
Memory
r12
63
SwapandSwapByteInstructions
Atomicoperationofamemoryreadfollowedbyamemorywrite
whichmovesbyteorwordquantitiesbetweenregistersand
memory.
Syntax:
SWP{<cond>}{B}Rd,Rm,[Rn]
Rn
temp
2
3
Memory
Rm
Rd
ToimplementanactualswapofcontentsmakeRd=Rm.
Thecompilercannotproducethisinstruction.
8/22/2008
64
SoftwareInterrupt(SWI)
3
1
3
0
2
9
Condition
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0
SWI NUMBER
InstructionType
SoftwareInterrupt
Ineffect,aSWIisauserdefinedinstruction.
ItcausesanexceptiontraptotheSWIhardwarevector(thus
causingachangetosupervisormode,plustheassociatedstate
saving),thuscausingtheSWIexceptionhandlertobecalled.
Thehandlercanthenexaminethecommentfieldofthe
instructiontodecidewhatoperationhasbeenrequested.
BymakinguseoftheSWImechanism,anoperatingsystemcan
implementasetofprivilegedoperationswhichapplications
runninginusermodecanrequest.
SeeExceptionHandlingModuleforfurtherdetails.
8/22/2008
65
Backup
8/22/2008
Assembler:Pseudoops
AREA>chunksofdata($data)orcode($code)
ADR>loadaddressintoaregister
ADRR0,BUFFER
ALIGN>adjustlocationcountertowordboundaryusuallyaftera
storagedirective
END>nomoretoassemble
8/22/2008
67
Assembler:Pseudoops
DCD>definedwordvaluestoragearea
BOWDCD1024,2055,9051
DCB>definedbytevaluestoragearea
BOBDCB10,12,15
%>zeroedoutbytestoragearea
BLBYTE%30
8/22/2008
68
Assembler:Pseudoops
IMPORT>nameofroutinetoimportforuseinthisroutine
IMPORT_printf;Cprintroutine
EXPORT>nameofroutinetoexportforuseinotherroutines
EXPORTadd2;add2routine
EQU>symbolreplacement
loopcntEQU5
8/22/2008
69
AssemblyLineFormat
label <whitespace> instruction <whitespace> ; comment
label: created by programmer, alphanumeric
whitespace: space(s) or tab character(s)
instruction: op-code mnemonic or pseudo-op with required fields
comment: preceded by ; ignored by assembler but useful
to the programmer for documentation
NOTE: All fields are optional.
8/22/2008
70
Example:Cassignments
C:
x = (a + b) - c;
Assembler:
ADR r4,a
LDR r0,[r4]
; get value of a
ADR r4,b
LDR r1,[r4]
; get value of b
ADD r3,r0,r1
; compute a+b
ADR r4,c
LDR r2,[r4]
; get value of c
SUB r3,r3,r2
; complete computation of x
ADR r4,x
STR r3,[r4]
; store value of x
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
71
Example:Cassignment
C:
y = a*(b+c);
Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
MUL
ADR
STR
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
72
Example:Cassignment
C:
z = (a << 2) |
(b & 15);
Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
73
Example:ifstatement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BLE fblock ; if a ><= b, branch to false block
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
74
ifstatement,contd.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
75
ifstatement,contd.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
76
Example:Conditionalinstructionimplementation
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for y
STRLT r0,[r4] ; store y
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
77
Conditionalinstructionimplementation,contd.
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
78
Example:switchstatement
C:
switch (test) { case 0: break; case 1: }
Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
79
Example:FIRfilter
C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
80
FIRfilter,cont.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
2008WayneWolf
8/22/2008
ComputersasComponents2nd ed.
81
ARMInstructionSetSummary(1/4)
82
ARMInstructionSetSummary(2/4)
83
ARMInstructionSetSummary(3/4)
84
ARMInstructionSetSummary(4/4)
85