ARM Microcontroller Code Size (Full)

32BitMicrocontrollerCodeSize Analysis
Draft1.2.4.JosephYiu,AndrewFrame
Overview
Microcontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionof productsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofa microcontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuch astheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsof switchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuch betterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown, thecodesizeadvantageof32bitmicrocontrollersislessobvious. Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststill achievinghighsystemperformanceandeaseofuse.
Typicalmythsofprogramsize
Myth#1:8bitand16bitmicrocontrollershavesmallercodesize
Thereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bit microcontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8 bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.This impressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bit microcontrollervendors. Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8 bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,although someinstructionsare1bytelong,manyothersare2or3byteslong. Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.Takingthe MSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperand instructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructionin MSP430Xcantake8bytes(64bits). SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0 processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2 microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstruction functionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversion oftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed ARMMicrocontrollerCodeSizeAnalysis|Overview 1
witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontroller programare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.
Number of bits 64 48 32 16
Max Min Min
Instruction size
Max
Max
Min
8051
PIC18
PIC24
MSP430 / MSP430X
ARM
Figure1:Sizeofasingleinstructioninvariousprocessors WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlya smallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsinthe Dhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis 18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsis evenlowerat5.4%(averageinstructionsize16.9bits).
Myth#2:Myapplicationonlyprocesses8bitdataand16bitdata
Manyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisno benefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheC compilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyou haveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaC libraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata. Thiscanaffectcodesizeandperformanceinvariousways: Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryout theoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount. Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevalue fromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles. Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthe samenumberofintegervariables.Whenthereareaninsufficientnumberofregistersinthe registerbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bit microcontrollermightresultinmorememoryaccesseswhichincreasescodesizeand reducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32 bitdataon16bitmicrocontrollers. ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2
Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassing variablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingor interruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bit microcontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatency becauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresaved atISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdata on16bitmicrocontrollers.
Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiple bytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.
Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdata
Most32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compact memoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable. Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthe handlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjust aseasyandefficientashandling32bitdata.
Myth#4:ClibrariesforARMprocessorsaretoobig
TherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumber ofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,the ARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesare especiallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.
Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplex
OntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines. VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC) withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterrupt requestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedto programtheprioritylevelofaninterruptandthenenableit. Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithin theflash,withouttheneedforanysoftwareprogrammingsteps.Whenaninterruptrequesttakes placetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecute theISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestored automaticallywhentheinterrupthandlerexits.Theotherregistersthatarenotcoveredbythe hardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyifthe registerisusedandmodifiedwithintheISR.
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 3
Whataboutmovingto16bitmicrocontrollers?
16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings) howeverthecodesizeisstillnotasoptimalasusing32bitprocessors: Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)or floatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecause multipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfers betweentheprocessorandthememory. Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegistersto holdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldinthe registerbank,hencereducingprocessingspeedaswellasincreasingstackoperationsand memoryaccesses. Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodes similarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedin applicationsthatrequireprocessingofcomplexdatasets. 64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressable memoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshave extensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,these extensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemory pointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultiple registerstoprocessit.
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 4
InstructionSetefficiency
Whencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers, theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(a leadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor, theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesame time,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,could enablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthe powerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsize requirements. SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technology whichprovidesahighlyefficientunifiedinstructionset.
PowerfulAddressingmode
TheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransfer instructions.Forexample: Immediateoffset(Address=Registervalue+offset) Registeroffset((Address=Registervalue1+shifted(Registervalue2)) PCrelated(Address=CurrentPCvalue+offset) Stackpointerrelated(Address=SP+offset) Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate PUSH/POPinstructionswithmultipleregisters
Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbe handledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,in mostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthe beginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwith thereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.
Conditionalbranches
AlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovide improvedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddata operationresults,andprovidingagoodbranchrange. Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortex M0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodeno matterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranches mightrequiremultipleinstructionstogetthesameoperations. ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5
Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwith signeddata,additionalstepsmightalsoberequiredintheconditionalbranch. InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralso supportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeofthe conditionalbranchinstructionsequence.
ConditionalExecution
AnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeisthe conditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).This instructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneed foradditionalbranches.Forexample, if(xpos1<xpos2){x1=xpos1; x2=xpos2; }else{ x1=xpos2; x2=xpos1; Thiscanbeconvertedtothefollowingassemblycode(needs12bytesintheCortexM3): CMP R0, R1 ITTEE CC ; if unsigned < MOVCC R2, R0 MOVCC R3, R1 MOVCS R3, R0 MOVCS R2, R1 Otherarchitecturesmightneedanadditionalbranch(e.g.needs14bytesinMSP430): CMP.W R14, R13 JGE Label1 ; if unsigned < ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 6
MOV.W R11, R14 MOV.W R12, R13 JMP Label2 Label1 MOV.W R11, R13 MOV.W R12, R14 Label2 ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.
MultiplyandDivide
BoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortex M3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.These instructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables. Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitation oftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedsto bemorethan8or16bits. TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarry outmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperation hastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthe memorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatato andfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercould causeexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforea multiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisadds additionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism. TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.Thisreducesthe codesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfor theClibrarytoincludeafunctionforhandlingdivideoperations.
Powerfulinstructionset
Inadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,the Cortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion. TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftware overheadin,forexample,peripheralcontrolandcommunicationdataprocessing.
ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 7
Breakingthe64Kbytememorybarrier
Asalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressable memory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyof thesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbyte memorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledby memorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomatically bytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbank switchingcodewouldbeneededandhencefurtherincreasestheprogramsize.
Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bit systems Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformance ofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g. copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.) Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51 architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.Therefore
memoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollike I/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedout inacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthe memorypagesmightnotbefullyutilizedandmemoryspaceiswasted.

Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesata price.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgram Counter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesof someMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,when thelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesrather than6(a33%increases): ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8
15
12 Op-code
11 Rsrc
7 Ad
6 B/W
5 As
3 Rdst
MSP430 Double Operand intruction
Source or destination 15:0 Destination 15:0
15
12 00011 Op-code
11
8 Source 19:16 Rsrc
6 A/L B/W
5 Rsrv As
MSP430X Double Operand intruction
Ad
3 0 Destination 19:16 Rdst
Source or destination 15:0 Destination 15:0
Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthe numberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddress pointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthe stackmemory.
Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused, whichisrequiredwhentheaddressrangeexceedsthe64krange. InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspacefor embeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasy touse.
ARMMicrocontrollerCodeSizeAnalysis|Examples 9
Examples
Todemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesare compiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocument fromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogram memorysizeinbytes. MSP430results: ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardware multiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified, theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinker outputreport(CODE+CONST). ARMCortexprocessorresults: ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevel is3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinker outputreport(VECTORS+CODE). Test Generic MSP430F5438 MSP430F5438 CortexM3 MSP430 largedata model 198 144 256 1122 180 198 144 244 1122 178 246 228 218 218 1170 202 144 256 1162 196 290 (linkererror) 218 218 1222 144 144 120 600 184 256 228 160 160 716(820 without modification) 900 4384(8496 without modification)
Math8bit Math16bit Math32bit MathFloat Matrix2dim8bit
Matrix2dim16bit 268 Matrixmult Switch8bit Switch16bit Firfilter(Note1) 276 200 198 1202
Dhry Whet(Note2)
923 6434
893 6308
1079 6614
Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortex Mprocessor(constunsignedshortintINPUT[]). Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandard thedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsize unlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeis editedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someof theconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).
Figure5:Codesizecomparisonforbasicoperations Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are: Summaryforsimple tests Totalsize(bytes) Advantage(%smaller) Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers., whereasDhrystoneprogramsizeiscloser. GenericMSP430 MSP430F5438 CortexM3
1720
1674 2.6%
1396 18.8%
Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat) are: Summaryforsimple tests Totalsize(bytes) Advantage(%smaller) Observations: 1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensity comparedtoMSP430inmostcases.Theremainingtestsshowsimilarcodedensitywhen comparedtoMSP430. 2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris 32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedto allowadirectcomparison. 3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto 20%(dhrystone). 4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument. ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfrom theircodesizecalculations. GenericMSP430 MSP430F5438 CortexM3
9681
9493 1.9%
6600 31.8%
Additionalinvestigationonfloatingpoint
WhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430C compileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerated doubleprecisionoperationsforsomeofthemathfunctionsused. Afterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduced dramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize. TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisby defaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesize increasedsignificantly. Programsize TypeDoubleis32bit TypeDoubleis64bit TheseresultsmatchthoseseenfortheARMCortexM3processor. Programsize Whetstonemodifiedtousesingleprecisiononly Outofboxcompileforwhetstone(usedouble precisionformathfunctions) Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplications wheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC. Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccurate comparativeresults. CortexM3 4384 8496 GenericMSP430 6434 11510 MSP430430F5438 6308 11798
ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13
RecommendationsonhowtogetthesmallestcodesizewithCortexM microcontrollers
UseMicroLib
IntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthe standardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuch smallercodesizewhencomparedtothestandardClibrary.
Ensuretheuseofareaoptimizations
TheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bit microcontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselect thehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresulting performancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclock frequency.
Usetherightdatatype
Whenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedata typeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegeris normally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32 bit. Type char,unsignedchar enum short,unsignedshort int,unsignedint long,unsignedlong Numberofbitsin 8051 8 8/16 16 16 32 Numberofbitsin MSP430 8 16 16 16 32 NumberofbitsinARM 8 8/16/32(smallestis chosen) 16 32 32
float 32 32 32 double 32 32 64 Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodify thedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.For example, constintmydata={1234,5678,}; Thisshouldbechangedto: constshortintmydata={1234,5678,}; ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest 14 codesizewithCortexMmicrocontrollers
Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortinteger mightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g. variables)doesnotrequiremodification.
Floatingpointfunctions
Somefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersand arebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththe whetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoan ARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsand modifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,in thewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecision inARMcompilers: X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0)); Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0)); Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F)); Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F)); Otherconstantdefinitionssuchas: /*Module7:Procedurecalls*/ X=1.0; Y=1.0; Z=1.0; shouldtobechangedtothefollowingforsingleprecisionrepresentation: /*Module7:Procedurecalls*/ X=1.0F; Y=1.0F; Z=1.0F;
Defineperipheralsasdatastructure
Youcanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.For example,insteadofrepresentingtheSysTicktimerregistersas #define #define #define #define SYSTICK_CTRL SYSTICK_LOAD SYSTICK_VAL SYSTICK_CALIB (*((volatile (*((volatile (*((volatile (*((volatile unsigned unsigned unsigned unsigned long long long long *)(0xE000E010))) *)(0xE000E014))) *)(0xE000E018))) *)(0xE000E01C)))
ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest 15 codesizewithCortexMmicrocontrollers
youcandefinetheSysTickregistersas: typedef struct { volatile unsigned int CTRL; volatile unsigned int LOAD; volatile unsigned int VAL; unsigned int CALIB; } SysTick_Type; #define SysTick ((SysTick_Type *) 0xE000E010) Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregister accesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifa sequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecan reducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesame addressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.
Conclusions
32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectures whilstatthesametimedeliveringmuchbetterperformance. Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherent problemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitis muchlessthanthatachievedbymigratingtothe32bitCortexprocessors. Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlast fewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.
Reference
ThefollowingarticlesonMSP430arereferenced: Reference 1 MSP430CompetitiveBenchmarking http://focus.ti.com/lit/an/slaa205c/slaa205c.pdf 2 EfficientMultiplicationandDivisionUsingMSP430 http://focus.ti.com/lit/an/slaa329/slaa329.pdf
ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16

ARM Microcontroller Code Size (Full)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ARM Microcontroller Code Size (Full)

Uploaded by

Copyright:

Available Formats

32BitMicrocontrollerCodeSize Analysis

memoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollike I/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedout inacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthe memorypagesmightnotbefullyutilizedandmemoryspaceiswasted.

MSP430 Double Operand intruction

Source or destination 15:0 Destination 15:0

8 Source 19:16 Rsrc

MSP430X Double Operand intruction

3 0 Destination 19:16 Rdst

Source or destination 15:0 Destination 15:0

Math8bit Math16bit Math32bit MathFloat Matrix2dim8bit

Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortinteger mightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g. variables)doesnotrequiremodification.

You might also like