You are on page 1of 61

11/29/2015

CompilerDesignQuickGuide

COMPILERDESIGNQUICKGUIDE
http://www.tutorialspoint.com/compiler_design/compiler_design_quick_guide.htm

Copyrighttutorialspoint.com

COMPILERDESIGNOVERVIEW
Computersareabalancedmixofsoftwareandhardware.Hardwareisjustapieceofmechanicaldeviceand
itsfunctionsarebeingcontrolledbyacompatiblesoftware.Hardwareunderstandsinstructionsintheform
ofelectroniccharge,whichisthecounterpartofbinarylanguageinsoftwareprogramming.Binary
languagehasonlytwoalphabets,0and1.Toinstruct,thehardwarecodesmustbewritteninbinaryformat,
whichissimplyaseriesof1sand0s.Itwouldbeadifficultandcumbersometaskforcomputer
programmerstowritesuchcodes,whichiswhywehavecompilerstowritesuchcodes.

LanguageProcessingSystem
Wehavelearntthatanycomputersystemismadeofhardwareandsoftware.Thehardwareunderstandsa
language,whichhumanscannotunderstand.Sowewriteprogramsinhighlevellanguage,whichiseasier
forustounderstandandremember.TheseprogramsarethenfedintoaseriesoftoolsandOScomponents
togetthedesiredcodethatcanbeusedbythemachine.ThisisknownasLanguageProcessingSystem.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

1/61

11/29/2015

CompilerDesignQuickGuide

Thehighlevellanguageisconvertedintobinarylanguageinvariousphases.Acompilerisaprogramthat
convertshighlevellanguagetoassemblylanguage.Similarly,anassemblerisaprogramthatconvertsthe
assemblylanguagetomachinelevellanguage.
Letusfirstunderstandhowaprogram,usingCcompiler,isexecutedonahostmachine.
UserwritesaprograminClanguagehigh levellanguage.
TheCcompiler,compilestheprogramandtranslatesittoassemblyprogramlow levellanguage.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

2/61

11/29/2015

CompilerDesignQuickGuide

Anassemblerthentranslatestheassemblyprogramintomachinecodeobject.
Alinkertoolisusedtolinkallthepartsoftheprogramtogetherforexecutionexecutablemachinecode.
Aloaderloadsallofthemintomemoryandthentheprogramisexecuted.
Beforedivingstraightintotheconceptsofcompilers,weshouldunderstandafewothertoolsthatwork
closelywithcompilers.

Preprocessor
Apreprocessor,generallyconsideredasapartofcompiler,isatoolthatproducesinputforcompilers.It
dealswithmacroprocessing,augmentation,fileinclusion,languageextension,etc.

Interpreter
Aninterpreter,likeacompiler,translateshighlevellanguageintolowlevelmachinelanguage.The
differenceliesinthewaytheyreadthesourcecodeorinput.Acompilerreadsthewholesourcecodeat
once,createstokens,checkssemantics,generatesintermediatecode,executesthewholeprogramandmay
involvemanypasses.Incontrast,aninterpreterreadsastatementfromtheinput,convertsittoan
intermediatecode,executesit,thentakesthenextstatementinsequence.Ifanerroroccurs,aninterpreter
stopsexecutionandreportsit.whereasacompilerreadsthewholeprogramevenifitencountersseveral
errors.

Assembler
Anassemblertranslatesassemblylanguageprogramsintomachinecode.Theoutputofanassembleris
calledanobjectfile,whichcontainsacombinationofmachineinstructionsaswellasthedatarequiredto
placetheseinstructionsinmemory.

Linker
Linkerisacomputerprogramthatlinksandmergesvariousobjectfilestogetherinordertomakean
executablefile.Allthesefilesmighthavebeencompiledbyseparateassemblers.Themajortaskofalinker
istosearchandlocatereferencedmodule/routinesinaprogramandtodeterminethememorylocation
wherethesecodeswillbeloaded,makingtheprograminstructiontohaveabsolutereferences.

Loader
Loaderisapartofoperatingsystemandisresponsibleforloadingexecutablefilesintomemoryand
executethem.Itcalculatesthesizeofaprograminstructionsanddataandcreatesmemoryspaceforit.It
initializesvariousregisterstoinitiateexecution.

Crosscompiler
AcompilerthatrunsonplatformAandiscapableofgeneratingexecutablecodeforplatformBiscalleda
crosscompiler.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

3/61

11/29/2015

CompilerDesignQuickGuide

SourcetosourceCompiler
Acompilerthattakesthesourcecodeofoneprogramminglanguageandtranslatesitintothesourcecode
ofanotherprogramminglanguageiscalledasourcetosourcecompiler.

CompilerArchitecture
Acompilercanbroadlybedividedintotwophasesbasedonthewaytheycompile.

AnalysisPhase
Knownasthefrontendofthecompiler,theanalysisphaseofthecompilerreadsthesourceprogram,
dividesitintocorepartsandthenchecksforlexical,grammarandsyntaxerrors.Theanalysisphase
generatesanintermediaterepresentationofthesourceprogramandsymboltable,whichshouldbefedto
theSynthesisphaseasinput.

SynthesisPhase
Knownasthebackendofthecompiler,thesynthesisphasegeneratesthetargetprogramwiththehelpof
intermediatesourcecoderepresentationandsymboltable.
Acompilercanhavemanyphasesandpasses.
Pass:Apassreferstothetraversalofacompilerthroughtheentireprogram.
Phase:Aphaseofacompilerisadistinguishablestage,whichtakesinputfromthepreviousstage,
processesandyieldsoutputthatcanbeusedasinputforthenextstage.Apasscanhavemorethan
onephase.

PhasesofCompiler
Thecompilationprocessisasequenceofvariousphases.Eachphasetakesinputfromitspreviousstage,
hasitsownrepresentationofsourceprogram,andfeedsitsoutputtothenextphaseofthecompiler.Letus
understandthephasesofacompiler.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

4/61

11/29/2015

CompilerDesignQuickGuide

LexicalAnalysis
Thefirstphaseofscannerworksasatextscanner.Thisphasescansthesourcecodeasastreamof
charactersandconvertsitintomeaningfullexemes.Lexicalanalyzerrepresentstheselexemesintheform
oftokensas:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

5/61

11/29/2015

CompilerDesignQuickGuide

<tokenname,attributevalue>

SyntaxAnalysis
Thenextphaseiscalledthesyntaxanalysisorparsing.Ittakesthetokenproducedbylexicalanalysisas
inputandgeneratesaparsetreeorsyntaxtree.Inthisphase,tokenarrangementsarecheckedagainstthe
sourcecodegrammar,i.e.theparserchecksiftheexpressionmadebythetokensissyntacticallycorrect.

SemanticAnalysis
Semanticanalysischeckswhethertheparsetreeconstructedfollowstherulesoflanguage.Forexample,
assignmentofvaluesisbetweencompatibledatatypes,andaddingstringtoaninteger.Also,thesemantic
analyzerkeepstrackofidentifiers,theirtypesandexpressionswhetheridentifiersaredeclaredbeforeuse
ornotetc.Thesemanticanalyzerproducesanannotatedsyntaxtreeasanoutput.

IntermediateCodeGeneration
Aftersemanticanalysisthecompilergeneratesanintermediatecodeofthesourcecodeforthetarget
machine.Itrepresentsaprogramforsomeabstractmachine.Itisinbetweenthehighlevellanguageand
themachinelanguage.Thisintermediatecodeshouldbegeneratedinsuchawaythatitmakesiteasierto
betranslatedintothetargetmachinecode.

CodeOptimization
Thenextphasedoescodeoptimizationoftheintermediatecode.Optimizationcanbeassumedas
somethingthatremovesunnecessarycodelines,andarrangesthesequenceofstatementsinordertospeed
uptheprogramexecutionwithoutwastingresourcesCPU, memory.

CodeGeneration
Inthisphase,thecodegeneratortakestheoptimizedrepresentationoftheintermediatecodeandmapsit
tothetargetmachinelanguage.Thecodegeneratortranslatestheintermediatecodeintoasequenceof
generallyrelocatablemachinecode.Sequenceofinstructionsofmachinecodeperformsthetaskasthe
intermediatecodewoulddo.

SymbolTable
Itisadatastructuremaintainedthroughoutallthephasesofacompiler.Alltheidentifier'snamesalong
withtheirtypesarestoredhere.Thesymboltablemakesiteasierforthecompilertoquicklysearchthe
identifierrecordandretrieveit.Thesymboltableisalsousedforscopemanagement.

COMPILERDESIGNLEXICALANALYSIS
Lexicalanalysisisthefirstphaseofacompiler.Ittakesthemodifiedsourcecodefromlanguage
preprocessorsthatarewrittenintheformofsentences.Thelexicalanalyzerbreaksthesesyntaxesintoa
seriesoftokens,byremovinganywhitespaceorcommentsinthesourcecode.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

6/61

11/29/2015

CompilerDesignQuickGuide

Ifthelexicalanalyzerfindsatokeninvalid,itgeneratesanerror.Thelexicalanalyzerworkscloselywiththe
syntaxanalyzer.Itreadscharacterstreamsfromthesourcecode,checksforlegaltokens,andpassesthe
datatothesyntaxanalyzerwhenitdemands.

Tokens
Lexemesaresaidtobeasequenceofcharactersalphanumericinatoken.Therearesomepredefinedrules
foreverylexemetobeidentifiedasavalidtoken.Theserulesaredefinedbygrammarrules,bymeansofa
pattern.Apatternexplainswhatcanbeatoken,andthesepatternsaredefinedbymeansofregular
expressions.
Inprogramminglanguage,keywords,constants,identifiers,strings,numbers,operatorsandpunctuations
symbolscanbeconsideredastokens.
Forexample,inClanguage,thevariabledeclarationline
intvalue=100;

containsthetokens:
int(keyword),value(identifier),=(operator),100(constant)and;(symbol).

SpecificationsofTokens
Letusunderstandhowthelanguagetheoryundertakesthefollowingterms:

Alphabets
Anyfinitesetofsymbols{0,1}isasetofbinaryalphabets,{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}isasetof
Hexadecimalalphabets,{az,AZ}isasetofEnglishlanguagealphabets.

Strings
Anyfinitesequenceofalphabetsiscalledastring.Lengthofthestringisthetotalnumberofoccurrenceof
alphabets,e.g.,thelengthofthestringtutorialspointis14andisdenotedby|tutorialspoint|=14.Astring
havingnoalphabets,i.e.astringofzerolengthisknownasanemptystringandisdenotedbyepsilon.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

7/61

11/29/2015

CompilerDesignQuickGuide

SpecialSymbols
Atypicalhighlevellanguagecontainsthefollowingsymbols:

ArithmeticSymbols

Addition + ,Subtraction ,Modulo,Multiplication ,Division/

Punctuation

Comma, ,Semicolon ,Dot. ,Arrow >

Assignment

SpecialAssignment

+=,/=,*=,=

Comparison

==,!=,<,<=,>,>=

Preprocessor

LocationSpecifier

&

Logical

&,&&,|,||,!

ShiftOperator

>>,>>>,<<,<<<

Language
Alanguageisconsideredasafinitesetofstringsoversomefinitesetofalphabets.Computerlanguagesare
consideredasfinitesets,andmathematicallysetoperationscanbeperformedonthem.Finitelanguages
canbedescribedbymeansofregularexpressions.

RegularExpressions
Thelexicalanalyzerneedstoscanandidentifyonlyafinitesetofvalidstring/token/lexemethatbelongto
thelanguageinhand.Itsearchesforthepatterndefinedbythelanguagerules.
Regularexpressionshavethecapabilitytoexpressfinitelanguagesbydefiningapatternforfinitestringsof
symbols.Thegrammardefinedbyregularexpressionsisknownasregulargrammar.Thelanguage
definedbyregulargrammarisknownasregularlanguage.
Regularexpressionisanimportantnotationforspecifyingpatterns.Eachpatternmatchesasetofstrings,
soregularexpressionsserveasnamesforasetofstrings.Programminglanguagetokenscanbedescribed
byregularlanguages.Thespecificationofregularexpressionsisanexampleofarecursivedefinition.
Regularlanguagesareeasytounderstandandhaveefficientimplementation.
Thereareanumberofalgebraiclawsthatareobeyedbyregularexpressions,whichcanbeusedto
manipulateregularexpressionsintoequivalentforms.

Operations

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

8/61

11/29/2015

CompilerDesignQuickGuide

Thevariousoperationsonlanguagesare:
UnionoftwolanguagesLandMiswrittenas
LUM={s|sisinLorsisinM}
ConcatenationoftwolanguagesLandMiswrittenas
LM={st|sisinLandtisinM}
TheKleeneClosureofalanguageLiswrittenas
L*=ZeroormoreoccurrenceoflanguageL.

Notations
IfrandsareregularexpressionsdenotingthelanguagesLrandLs,then
Union:r|sisaregularexpressiondenotingLrULs
Concatenation:rsisaregularexpressiondenotingLrLs
Kleeneclosure:r*isaregularexpressiondenotingL(r)*
risaregularexpressiondenotingLr

PrecedenceandAssociativity
*,concatenation. ,and|pipesignareleftassociative
*hasthehighestprecedence
Concatenation. hasthesecondhighestprecedence.
|pipesignhasthelowestprecedenceofall.

Representingvalidtokensofalanguageinregularexpression
Ifxisaregularexpression,then:
x*meanszeroormoreoccurrenceofx.
i.e.,itcangenerate{e,x,xx,xxx,xxxx,}
x+meansoneormoreoccurrenceofx.
i.e.,itcangenerate{x,xx,xxx,xxxx}orx.x*
x?meansatmostoneoccurrenceofx
i.e.,itcangenerateeither{x}or{e}.
[az]isalllowercasealphabetsofEnglishlanguage.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

9/61

11/29/2015

CompilerDesignQuickGuide

[AZ]isalluppercasealphabetsofEnglishlanguage.
[09]isallnaturaldigitsusedinmathematics.

Representingoccurrenceofsymbolsusingregularexpressions
letter=[az]or[AZ]
digit=0|1|2|3|4|5|6|7|8|9or[09]
sign=[+|]

Representinglanguagetokensusingregularexpressions
Decimal=sign? digit+
Identifier=letterletter | digit*
Theonlyproblemleftwiththelexicalanalyzerishowtoverifythevalidityofaregularexpressionusedin
specifyingthepatternsofkeywordsofalanguage.Awellacceptedsolutionistousefiniteautomatafor
verification.

FiniteAutomata
Finiteautomataisastatemachinethattakesastringofsymbolsasinputandchangesitsstateaccordingly.
Finiteautomataisarecognizerforregularexpressions.Whenaregularexpressionstringisfedintofinite
automata,itchangesitsstateforeachliteral.Iftheinputstringissuccessfullyprocessedandtheautomata
reachesitsfinalstate,itisaccepted,i.e.,thestringjustfedwassaidtobeavalidtokenofthelanguagein
hand.
Themathematicalmodeloffiniteautomataconsistsof:
FinitesetofstatesQ
Finitesetofinputsymbols
OneStartstateq0
Setoffinalstatesqf
Transitionfunction
ThetransitionfunctionmapsthefinitesetofstateQtoafinitesetofinputsymbols,QQ

FiniteAutomataConstruction
LetLrbearegularlanguagerecognizedbysomefiniteautomataFA.
States:StatesofFAarerepresentedbycircles.Statenamesarewritteninsidecircles.
Startstate:Thestatefromwheretheautomatastarts,isknownasthestartstate.Startstatehasan

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

10/61

11/29/2015

CompilerDesignQuickGuide

arrowpointedtowardsit.
Intermediatestates:Allintermediatestateshaveatleasttwoarrowsonepointingtoand
anotherpointingoutfromthem.
Finalstate:Iftheinputstringissuccessfullyparsed,theautomataisexpectedtobeinthisstate.
Finalstateisrepresentedbydoublecircles.Itmayhaveanyoddnumberofarrowspointingtoitand
evennumberofarrowspointingoutfromit.Thenumberofoddarrowsareonegreaterthaneven,
i.e.odd=even+1.
Transition:Thetransitionfromonestatetoanotherstatehappenswhenadesiredsymbolinthe
inputisfound.Upontransition,automatacaneithermovetothenextstateorstayinthesamestate.
Movementfromonestatetoanotherisshownasadirectedarrow,wherethearrowspointstothe
destinationstate.Ifautomatastaysonthesamestate,anarrowpointingfromastatetoitselfis
drawn.
Example:WeassumeFAacceptsanythreedigitbinaryvalueendingindigit1.FA={Q(q0,qf),0, 1,q0,
qf,}

LongestMatchRule
Whenthelexicalanalyzerreadthesourcecode,itscansthecodeletterbyletterandwhenitencountersa
whitespace,operatorsymbol,orspecialsymbols,itdecidesthatawordiscompleted.
Forexample:
intintvalue;

Whilescanningbothlexemestillint,thelexicalanalyzercannotdeterminewhetheritisakeywordintor
theinitialsofidentifierintvalue.
TheLongestMatchRulestatesthatthelexemescannedshouldbedeterminedbasedonthelongestmatch
amongallthetokensavailable.
Thelexicalanalyzeralsofollowsruleprioritywhereareservedword,e.g.,akeyword,ofalanguageis
givenpriorityoveruserinput.Thatis,ifthelexicalanalyzerfindsalexemethatmatcheswithanyexisting
reservedword,itshouldgenerateanerror.

COMPILERDESIGNSYNTAXANALYSIS
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

11/61

11/29/2015

CompilerDesignQuickGuide

COMPILERDESIGNSYNTAXANALYSIS
Syntaxanalysisorparsingisthesecondphaseofacompiler.Inthischapter,weshalllearnthebasic
conceptsusedintheconstructionofaparser.
Wehaveseenthatalexicalanalyzercanidentifytokenswiththehelpofregularexpressionsandpattern
rules.Butalexicalanalyzercannotcheckthesyntaxofagivensentenceduetothelimitationsoftheregular
expressions.Regularexpressionscannotcheckbalancingtokens,suchasparenthesis.Therefore,thisphase
usescontextfreegrammarCFG,whichisrecognizedbypushdownautomata.
CFG,ontheotherhand,isasupersetofRegularGrammar,asdepictedbelow:

ItimpliesthateveryRegularGrammarisalsocontextfree,butthereexistssomeproblems,whichare
beyondthescopeofRegularGrammar.CFGisahelpfultoolindescribingthesyntaxofprogramming
languages.

ContextFreeGrammar
Inthissection,wewillfirstseethedefinitionofcontextfreegrammarandintroduceterminologiesusedin
parsingtechnology.
Acontextfreegrammarhasfourcomponents:
AsetofnonterminalsV.Nonterminalsaresyntacticvariablesthatdenotesetsofstrings.The
nonterminalsdefinesetsofstringsthathelpdefinethelanguagegeneratedbythegrammar.
Asetoftokens,knownasterminalsymbols.Terminalsarethebasicsymbolsfromwhichstrings
areformed.
AsetofproductionsP.Theproductionsofagrammarspecifythemannerinwhichtheterminals
andnonterminalscanbecombinedtoformstrings.Eachproductionconsistsofanonterminal
calledtheleftsideoftheproduction,anarrow,andasequenceoftokensand/oronterminals,
calledtherightsideoftheproduction.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

12/61

11/29/2015

CompilerDesignQuickGuide

OneofthenonterminalsisdesignatedasthestartsymbolSfromwheretheproductionbegins.
Thestringsarederivedfromthestartsymbolbyrepeatedlyreplacinganonterminalinitiallythestartsymbol
bytherightsideofaproduction,forthatnonterminal.

Example
Wetaketheproblemofpalindromelanguage,whichcannotbedescribedbymeansofRegularExpression.
Thatis,L={w|w=wR}isnotaregularlanguage.ButitcanbedescribedbymeansofCFG,asillustrated
below:
G=(V,,P,S)

Where:
V={Q,Z,N}
={0,1}
P={QZ|QN|Q|Z0Q0|N1Q1}
S={Q}

Thisgrammardescribespalindromelanguage,suchas:1001,11100111,00100,1010101,11111,etc.

SyntaxAnalyzers
Asyntaxanalyzerorparsertakestheinputfromalexicalanalyzerintheformoftokenstreams.Theparser
analyzesthesourcecodetokenstreamagainsttheproductionrulestodetectanyerrorsinthecode.The
outputofthisphaseisaparsetree.

Thisway,theparseraccomplishestwotasks,i.e.,parsingthecode,lookingforerrorsandgeneratinga
parsetreeastheoutputofthephase.
Parsersareexpectedtoparsethewholecodeevenifsomeerrorsexistintheprogram.Parsersuseerror
recoveringstrategies,whichwewilllearnlaterinthischapter.

Derivation

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

13/61

11/29/2015

CompilerDesignQuickGuide

Aderivationisbasicallyasequenceofproductionrules,inordertogettheinputstring.Duringparsing,we
taketwodecisionsforsomesententialformofinput:
Decidingthenonterminalwhichistobereplaced.
Decidingtheproductionrule,bywhich,thenonterminalwillbereplaced.
Todecidewhichnonterminaltobereplacedwithproductionrule,wecanhavetwooptions.

LeftmostDerivation
Ifthesententialformofaninputisscannedandreplacedfromlefttoright,itiscalledleftmostderivation.
Thesententialformderivedbytheleftmostderivationiscalledtheleftsententialform.

RightmostDerivation
Ifwescanandreplacetheinputwithproductionrules,fromrighttoleft,itisknownasrightmost
derivation.Thesententialformderivedfromtherightmostderivationiscalledtherightsententialform.
Example
Productionrules:
EE+E
EE*E
Eid

Inputstring:id+id*id
Theleftmostderivationis:
EE*E
EE+E*E
Eid+E*E
Eid+id*E
Eid+id*id

Noticethattheleftmostsidenonterminalisalwaysprocessedfirst.
Therightmostderivationis:
EE+E
EE+E*E
EE+E*id
EE+id*id
Eid+id*id

ParseTree
Aparsetreeisagraphicaldepictionofaderivation.Itisconvenienttoseehowstringsarederivedfromthe

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

14/61

11/29/2015

CompilerDesignQuickGuide

startsymbol.Thestartsymbolofthederivationbecomestherootoftheparsetree.Letusseethisbyan
examplefromthelasttopic.
Wetaketheleftmostderivationofa+b*c
Theleftmostderivationis:
EE*E
EE+E*E
Eid+E*E
Eid+id*E
Eid+id*id

Step1:

EE*E

Step2:

EE+E*E

Step3:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

15/61

11/29/2015

CompilerDesignQuickGuide

Eid+E*E

Step4:

Eid+id*E

Step5:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

16/61

11/29/2015

CompilerDesignQuickGuide

Eid+id*id

Inaparsetree:
Allleafnodesareterminals.
Allinteriornodesarenonterminals.
Inordertraversalgivesoriginalinputstring.
Aparsetreedepictsassociativityandprecedenceofoperators.Thedeepestsubtreeistraversedfirst,
thereforetheoperatorinthatsubtreegetsprecedenceovertheoperatorwhichisintheparentnodes.

TypesofParsing
Syntaxanalyzersfollowproductionrulesdefinedbymeansofcontextfreegrammar.Thewaythe
productionrulesareimplementedderivationdividesparsingintotwotypes:topdownparsingandbottom
upparsing.

TopdownParsing
Whentheparserstartsconstructingtheparsetreefromthestartsymbolandthentriestotransformthe
startsymboltotheinput,itiscalledtopdownparsing.
Recursivedescentparsing:Itisacommonformoftopdownparsing.Itiscalledrecursiveasit
usesrecursiveprocedurestoprocesstheinput.Recursivedescentparsingsuffersfrombacktracking.
Backtracking:Itmeans,ifonederivationofaproductionfails,thesyntaxanalyzerrestartsthe
processusingdifferentrulesofsameproduction.Thistechniquemayprocesstheinputstringmore
thanoncetodeterminetherightproduction.

BottomupParsing
Asthenamesuggests,bottomupparsingstartswiththeinputsymbolsandtriestoconstructtheparsetree

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

17/61

11/29/2015

CompilerDesignQuickGuide

uptothestartsymbol.
Example:
Inputstring:a+b*c
Productionrules:
SE
EE+T
EE*T
ET
Tid

Letusstartbottomupparsing
a+b*c

Readtheinputandcheckifanyproductionmatcheswiththeinput:
a+b*c
T+b*c
E+b*c
E+T*c
E*c
E*T
E
S

Ambiguity
AgrammarGissaidtobeambiguousifithasmorethanoneparsetreeleftorrightderivationforatleastone
string.
Example
EE+E
EEE
Eid

Forthestringid+idid,theabovegrammargeneratestwoparsetrees:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

18/61

11/29/2015

CompilerDesignQuickGuide

Thelanguagegeneratedbyanambiguousgrammarissaidtobeinherentlyambiguous.Ambiguityin
grammarisnotgoodforacompilerconstruction.Nomethodcandetectandremoveambiguity
automatically,butitcanberemovedbyeitherrewritingthewholegrammarwithoutambiguity,orby
settingandfollowingassociativityandprecedenceconstraints.

Associativity
Ifanoperandhasoperatorsonbothsides,thesideonwhichtheoperatortakesthisoperandisdecidedby
theassociativityofthoseoperators.Iftheoperationisleftassociative,thentheoperandwillbetakenby
theleftoperatororiftheoperationisrightassociative,therightoperatorwilltaketheoperand.
Example
OperationssuchasAddition,Multiplication,Subtraction,andDivisionareleftassociative.Iftheexpression
contains:
idopidopid

itwillbeevaluatedas:
(idopid)opid

Forexample,id + id+id
OperationslikeExponentiationarerightassociative,i.e.,theorderofevaluationinthesameexpression
willbe:
idop(idopid)

Forexample,id^id id

Precedence

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

19/61

11/29/2015

CompilerDesignQuickGuide

Iftwodifferentoperatorsshareacommonoperand,theprecedenceofoperatorsdecideswhichwilltake
theoperand.Thatis,2+3*4canhavetwodifferentparsetrees,onecorrespondingto2 + 3*4andanother
correspondingto2+3 4.Bysettingprecedenceamongoperators,thisproblemcanbeeasilyremoved.As
inthepreviousexample,mathematically*multiplicationhasprecedenceover+addition,sotheexpression
2+3*4willalwaysbeinterpretedas:
2+(3*4)

Thesemethodsdecreasethechancesofambiguityinalanguageoritsgrammar.

LeftRecursion
AgrammarbecomesleftrecursiveifithasanynonterminalAwhosederivationcontainsAitselfasthe
leftmostsymbol.Leftrecursivegrammarisconsideredtobeaproblematicsituationfortopdown
parsers.TopdownparsersstartparsingfromtheStartsymbol,whichinitselfisnonterminal.So,when
theparserencountersthesamenonterminalinitsderivation,itbecomeshardforittojudgewhentostop
parsingtheleftnonterminalanditgoesintoaninfiniteloop.
Example:
(1)A=>A|
(2)S=>A|
A=>Sd

1isanexampleofimmediateleftrecursion,whereAisanynonterminalsymbolandrepresentsastring
ofnonterminals.
2isanexampleofindirectleftrecursion.

AtopdownparserwillfirstparsetheA,whichinturnwillyieldastringconsistingofAitselfandtheparser
maygointoaloopforever.

RemovalofLeftRecursion

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

20/61

11/29/2015

CompilerDesignQuickGuide

Onewaytoremoveleftrecursionistousethefollowingtechnique:
Theproduction
A=>A|

isconvertedintofollowingproductions
A=>A
A=>A|

Thisdoesnotimpactthestringsderivedfromthegrammar,butitremovesimmediateleftrecursion.
Secondmethodistousethefollowingalgorithm,whichshouldeliminatealldirectandindirectleft
recursions.
Algorithm
START
ArrangenonterminalsinsomeorderlikeA1,A2,A3,,An
foreachifrom1ton
{
foreachjfrom1toi1
{
replaceeachproductionofformAiAj
withAi1|2|3||
whereAj1|2||narecurrentAjproductions
}
}
eliminateimmediateleftrecursion
END

Example
Theproductionset
S=>A|
A=>Sd

afterapplyingtheabovealgorithm,shouldbecome
S=>A|
A=>Ad|d

andthen,removeimmediateleftrecursionusingthefirsttechnique.
A=>dA
A=>dA|

Nownoneoftheproductionhaseitherdirectorindirectleftrecursion.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

21/61

11/29/2015

CompilerDesignQuickGuide

LeftFactoring
Ifmorethanonegrammarproductionruleshasacommonprefixstring,thenthetopdownparsercannot
makeachoiceastowhichoftheproductionitshouldtaketoparsethestringinhand.
Example
Ifatopdownparserencountersaproductionlike
A||

Thenitcannotdeterminewhichproductiontofollowtoparsethestringasbothproductionsarestarting
fromthesameterminalornon terminal.Toremovethisconfusion,weuseatechniquecalledleftfactoring.
Leftfactoringtransformsthegrammartomakeitusefulfortopdownparsers.Inthistechnique,wemake
oneproductionforeachcommonprefixesandtherestofthederivationisaddedbynewproductions.
Example
Theaboveproductionscanbewrittenas
A=>A
A=>||

Nowtheparserhasonlyoneproductionperprefixwhichmakesiteasiertotakedecisions.

FirstandFollowSets
Animportantpartofparsertableconstructionistocreatefirstandfollowsets.Thesesetscanprovidethe
actualpositionofanyterminalinthederivation.Thisisdonetocreatetheparsingtablewherethedecision
ofreplacingT[A,t]=withsomeproductionrule.

FirstSet
Thissetiscreatedtoknowwhatterminalsymbolisderivedinthefirstpositionbyanonterminal.For
example,
t

Thatisderivestterminalintheveryfirstposition.So,tFIRST.

AlgorithmforcalculatingFirstset
LookatthedefinitionofFIRSTset:
ifisaterminal,thenFIRST={}.
ifisanonterminalandisaproduction,thenFIRST={}.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

22/61

11/29/2015

CompilerDesignQuickGuide

ifisanonterminaland123nandanyFIRSTcontainstthentisinFIRST.
Firstsetcanbeseenas:FIRST={t|* t}{|* }

FollowSet
Likewise,wecalculatewhatterminalsymbolimmediatelyfollowsanonterminalinproductionrules.We
donotconsiderwhatthenonterminalcangeneratebutinstead,weseewhatwouldbethenextterminal
symbolthatfollowstheproductionsofanonterminal.

AlgorithmforcalculatingFollowset:
ifisastartsymbol,thenFOLLOW=$
ifisanonterminalandhasaproductionAB,thenFIRSTBisinFOLLOWAexcept.
ifisanonterminalandhasaproductionAB,whereB,thenFOLLOWAisinFOLLOW.
Followsetcanbeseenas:FOLLOW={t|S*t*}

ErrorrecoveryStrategies
Aparsershouldbeabletodetectandreportanyerrorintheprogram.Itisexpectedthatwhenanerroris
encountered,theparsershouldbeabletohandleitandcarryonparsingtherestoftheinput.Mostlyitis
expectedfromtheparsertocheckforerrorsbuterrorsmaybeencounteredatvariousstagesofthe
compilationprocess.Aprogrammayhavethefollowingkindsoferrorsatvariousstages:
Lexical:nameofsomeidentifiertypedincorrectly
Syntactical:missingsemicolonorunbalancedparenthesis
Semantical:incompatiblevalueassignment
Logical:codenotreachable,infiniteloop
Therearefourcommonerrorrecoverystrategiesthatcanbeimplementedintheparsertodealwitherrors
inthecode.

Panicmode
Whenaparserencountersanerroranywhereinthestatement,itignorestherestofthestatementbynot
processinginputfromerroneousinputtodelimiter,suchassemicolon.Thisistheeasiestwayoferror
recoveryandalso,itpreventstheparserfromdevelopinginfiniteloops.

Statementmode
Whenaparserencountersanerror,ittriestotakecorrectivemeasuressothattherestofinputsof
statementallowtheparsertoparseahead.Forexample,insertingamissingsemicolon,replacingcomma

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

23/61

11/29/2015

CompilerDesignQuickGuide

withasemicolonetc.Parserdesignershavetobecarefulherebecauseonewrongcorrectionmayleadtoan
infiniteloop.

Errorproductions
Somecommonerrorsareknowntothecompilerdesignersthatmayoccurinthecode.Inaddition,the
designerscancreateaugmentedgrammartobeused,asproductionsthatgenerateerroneousconstructs
whentheseerrorsareencountered.

Globalcorrection
Theparserconsiderstheprograminhandasawholeandtriestofigureoutwhattheprogramisintended
todoandtriestofindoutaclosestmatchforit,whichiserrorfree.WhenanerroneousinputstatementXis
fed,itcreatesaparsetreeforsomeclosesterrorfreestatementY.Thismayallowtheparsertomake
minimalchangesinthesourcecode,butduetothecomplexitytimeandspaceofthisstrategy,ithasnotbeen
implementedinpracticeyet.

AbstractSyntaxTrees
Parsetreerepresentationsarenoteasytobeparsedbythecompiler,astheycontainmoredetailsthan
actuallyneeded.Takethefollowingparsetreeasanexample:

Ifwatchedclosely,wefindmostoftheleafnodesaresinglechildtotheirparentnodes.Thisinformation
canbeeliminatedbeforefeedingittothenextphase.Byhidingextrainformation,wecanobtainatreeas
shownbelow:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

24/61

11/29/2015

CompilerDesignQuickGuide

Abstracttreecanberepresentedas:

ASTsareimportantdatastructuresinacompilerwithleastunnecessaryinformation.ASTsaremore
compactthanaparsetreeandcanbeeasilyusedbyacompiler.

LimitationsofSyntaxAnalyzers
Syntaxanalyzersreceivetheirinputs,intheformoftokens,fromlexicalanalyzers.Lexicalanalyzersare
responsibleforthevalidityofatokensuppliedbythesyntaxanalyzer.Syntaxanalyzershavethefollowing
drawbacks:
itcannotdetermineifatokenisvalid,
itcannotdetermineifatokenisdeclaredbeforeitisbeingused,
itcannotdetermineifatokenisinitializedbeforeitisbeingused,
itcannotdetermineifanoperationperformedonatokentypeisvalidornot.
Thesetasksareaccomplishedbythesemanticanalyzer,whichweshallstudyinSemanticAnalysis.

COMPILERDESIGNSEMANTICANALYSIS
Wehavelearnthowaparserconstructsparsetreesinthesyntaxanalysisphase.Theplainparsetree
constructedinthatphaseisgenerallyofnouseforacompiler,asitdoesnotcarryanyinformationofhow

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

25/61

11/29/2015

CompilerDesignQuickGuide

toevaluatethetree.Theproductionsofcontextfreegrammar,whichmakestherulesofthelanguage,do
notaccommodatehowtointerpretthem.
Forexample
EE+T

TheaboveCFGproductionhasnosemanticruleassociatedwithit,anditcannothelpinmakinganysense
oftheproduction.

Semantics
Semanticsofalanguageprovidemeaningtoitsconstructs,liketokensandsyntaxstructure.Semanticshelp
interpretsymbols,theirtypes,andtheirrelationswitheachother.Semanticanalysisjudgeswhetherthe
syntaxstructureconstructedinthesourceprogramderivesanymeaningornot.
CFG+semanticrules=SyntaxDirectedDefinitions

Forexample:
inta=value;

shouldnotissueanerrorinlexicalandsyntaxanalysisphase,asitislexicallyandstructurallycorrect,butit
shouldgenerateasemanticerrorasthetypeoftheassignmentdiffers.Theserulesaresetbythegrammar
ofthelanguageandevaluatedinsemanticanalysis.Thefollowingtasksshouldbeperformedinsemantic
analysis:
Scoperesolution
Typechecking
Arrayboundchecking

SemanticErrors
Wehavementionedsomeofthesemanticserrorsthatthesemanticanalyzerisexpectedtorecognize:
Typemismatch
Undeclaredvariable
Reservedidentifiermisuse.
Multipledeclarationofvariableinascope.
Accessinganoutofscopevariable.
Actualandformalparametermismatch.

AttributeGrammar

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

26/61

11/29/2015

CompilerDesignQuickGuide

Attributegrammarisaspecialformofcontextfreegrammarwheresomeadditionalinformationattributes
areappendedtooneormoreofitsnonterminalsinordertoprovidecontextsensitiveinformation.Each
attributehaswelldefineddomainofvalues,suchasinteger,float,character,string,andexpressions.
Attributegrammarisamediumtoprovidesemanticstothecontextfreegrammaranditcanhelpspecify
thesyntaxandsemanticsofaprogramminglanguage.Attributegrammarwhenviewedasaparse treecanpass
valuesorinformationamongthenodesofatree.
Example:
EE+T{E.value=E.value+T.value}

TherightpartoftheCFGcontainsthesemanticrulesthatspecifyhowthegrammarshouldbeinterpreted.
Here,thevaluesofnonterminalsEandTareaddedtogetherandtheresultiscopiedtothenonterminal
E.
Semanticattributesmaybeassignedtotheirvaluesfromtheirdomainatthetimeofparsingandevaluated
atthetimeofassignmentorconditions.Basedonthewaytheattributesgettheirvalues,theycanbe
broadlydividedintotwocategories:synthesizedattributesandinheritedattributes.

Synthesizedattributes
Theseattributesgetvaluesfromtheattributevaluesoftheirchildnodes.Toillustrate,assumethe
followingproduction:
SABC

IfSistakingvaluesfromitschildnodesA, B, C,thenitissaidtobeasynthesizedattribute,asthevaluesof
ABCaresynthesizedtoS.
AsinourpreviousexampleE E + T,theparentnodeEgetsitsvaluefromitschildnode.Synthesized
attributesnevertakevaluesfromtheirparentnodesoranysiblingnodes.

Inheritedattributes
Incontrasttosynthesizedattributes,inheritedattributescantakevaluesfromparentand/orsiblings.Asin
thefollowingproduction,
SABC

AcangetvaluesfromS,BandC.BcantakevaluesfromS,A,andC.Likewise,CcantakevaluesfromS,A,
andB.
Expansion:Whenanonterminalisexpandedtoterminalsasperagrammaticalrule

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

27/61

11/29/2015

CompilerDesignQuickGuide

Reduction:Whenaterminalisreducedtoitscorrespondingnonterminalaccordingtogrammarrules.
Syntaxtreesareparsedtopdownandlefttoright.Wheneverreductionoccurs,weapplyitscorresponding
semanticrulesactions.
SemanticanalysisusesSyntaxDirectedTranslationstoperformtheabovetasks.
SemanticanalyzerreceivesASTAbstractSyntaxTreefromitspreviousstagesyntaxanalysis.
SemanticanalyzerattachesattributeinformationwithAST,whicharecalledAttributedAST.
Attributesaretwotuplevalue,<attributename,attributevalue>
Forexample:
intvalue=5;
<type,integer>
<presentvalue,5>

Foreveryproduction,weattachasemanticrule.

SattributedSDT
IfanSDTusesonlysynthesizedattributes,itiscalledasSattributedSDT.Theseattributesareevaluated
usingSattributedSDTsthathavetheirsemanticactionswrittenaftertheproductionrighthandside.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

28/61

11/29/2015

CompilerDesignQuickGuide

Asdepictedabove,attributesinSattributedSDTsareevaluatedinbottomupparsing,asthevaluesofthe
parentnodesdependuponthevaluesofthechildnodes.

LattributedSDT
ThisformofSDTusesbothsynthesizedandinheritedattributeswithrestrictionofnottakingvaluesfrom
rightsiblings.
InLattributedSDTs,anonterminalcangetvaluesfromitsparent,child,andsiblingnodes.Asinthe
followingproduction
SABC

ScantakevaluesfromA,B,andCsynthesized.AcantakevaluesfromSonly.BcantakevaluesfromSandA.
CcangetvaluesfromS,A,andB.Nononterminalcangetvaluesfromthesiblingtoitsright.
AttributesinLattributedSDTsareevaluatedbydepthfirstandlefttorightparsingmanner.

WemayconcludethatifadefinitionisSattributed,thenitisalsoLattributedasLattributeddefinition
enclosesSattributeddefinitions.

COMPILERDESIGNPARSER
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

29/61

11/29/2015

CompilerDesignQuickGuide

COMPILERDESIGNPARSER
Inthepreviouschapter,weunderstoodthebasicconceptsinvolvedinparsing.Inthischapter,wewilllearn
thevarioustypesofparserconstructionmethodsavailable.
Parsingcanbedefinedastopdownorbottomupbasedonhowtheparsetreeisconstructed.

TopDownParsing
Wehavelearntinthelastchapterthatthetopdownparsingtechniqueparsestheinput,andstarts
constructingaparsetreefromtherootnodegraduallymovingdowntotheleafnodes.Thetypesoftop
downparsingaredepictedbelow:

RecursiveDescentParsing

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

30/61

11/29/2015

CompilerDesignQuickGuide

Recursivedescentisatopdownparsingtechniquethatconstructstheparsetreefromthetopandtheinput
isreadfromlefttoright.Itusesproceduresforeveryterminalandnonterminalentity.Thisparsing
techniquerecursivelyparsestheinputtomakeaparsetree,whichmayormaynotrequirebacktracking.
Butthegrammarassociatedwithitifnotleftfactoredcannotavoidbacktracking.Aformofrecursivedescent
parsingthatdoesnotrequireanybacktrackingisknownaspredictiveparsing.
Thisparsingtechniqueisregardedrecursiveasitusescontextfreegrammarwhichisrecursiveinnature.

Backtracking
Topdownparsersstartfromtherootnodestartsymbolandmatchtheinputstringagainsttheproduction
rulestoreplacethemifmatched.Tounderstandthis,takethefollowingexampleofCFG:
SrXd|rZd
Xoa|ea
Zai

Foraninputstring:read,atopdownparser,willbehavelikethis:
ItwillstartwithSfromtheproductionrulesandwillmatchitsyieldtotheleftmostletteroftheinput,i.e.
r.TheveryproductionofSS rXdmatcheswithit.Sothetopdownparseradvancestothenextinput
letteri.e.e.TheparsertriestoexpandnonterminalXandchecksitsproductionfromtheleftXoa.It
doesnotmatchwiththenextinputsymbol.Sothetopdownparserbacktrackstoobtainthenext
productionruleofX,Xea.
Nowtheparsermatchesalltheinputlettersinanorderedmanner.Thestringisaccepted.

PredictiveParser
Predictiveparserisarecursivedescentparser,whichhasthecapabilitytopredictwhichproductionistobe
usedtoreplacetheinputstring.Thepredictiveparserdoesnotsufferfrombacktracking.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

31/61

11/29/2015

CompilerDesignQuickGuide

Toaccomplishitstasks,thepredictiveparserusesalookaheadpointer,whichpointstothenextinput
symbols.Tomaketheparserbacktrackingfree,thepredictiveparserputssomeconstraintsonthe
grammarandacceptsonlyaclassofgrammarknownasLLkgrammar.

Predictiveparsingusesastackandaparsingtabletoparsetheinputandgenerateaparsetree.Boththe
stackandtheinputcontainsanendsymbol$todenotethatthestackisemptyandtheinputisconsumed.
Theparserreferstotheparsingtabletotakeanydecisionontheinputandstackelementcombination.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

32/61

11/29/2015

CompilerDesignQuickGuide

Inrecursivedescentparsing,theparsermayhavemorethanoneproductiontochoosefromforasingle
instanceofinput,whereasinpredictiveparser,eachstephasatmostoneproductiontochoose.There
mightbeinstanceswherethereisnoproductionmatchingtheinputstring,makingtheparsingprocedure
tofail.

LLParser
AnLLParseracceptsLLgrammar.LLgrammarisasubsetofcontextfreegrammarbutwithsome
restrictionstogetthesimplifiedversion,inordertoachieveeasyimplementation.LLgrammarcanbe
implementedbymeansofbothalgorithmsnamely,recursivedescentortabledriven.
LLparserisdenotedasLLk.ThefirstLinLLkisparsingtheinputfromlefttoright,thesecondLinLLk
standsforleftmostderivationandkitselfrepresentsthenumberoflookaheads.Generallyk=1,soLLk
mayalsobewrittenasLL1.

LLParsingAlgorithm

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

33/61

11/29/2015

CompilerDesignQuickGuide

WemaysticktodeterministicLL1forparserexplanation,asthesizeoftablegrowsexponentiallywiththe
valueofk.Secondly,ifagivengrammarisnotLL1,thenusually,itisnotLLk,foranygivenk.
GivenbelowisanalgorithmforLL1Parsing:
Input:
string
parsingtableMforgrammarG
Output:
IfisinL(G)thenleftmostderivationof,
errorotherwise.
InitialState:$Sonstack(withSbeingstartsymbol)
$intheinputbuffer
SETiptopointthefirstsymbolof$.
repeat
letXbethetopstacksymbolandathesymbolpointedbyip.
ifXVtor$
ifX=a
POPXandadvanceip.

else
error()

endif
else
/*Xisnonterminal*/
ifM[X,a]=XY1,Y2,...Yk
POPX
PUSHYk,Yk1,...Y1/*Y1ontop*/
OutputtheproductionXY1,Y2,...Yk
else
error()
endif

endif
untilX=$
/*emptystack*/

AgrammarGisLL1ifA>alpha|baretwodistinctproductionsofG:
fornoterminal,bothalphaandbetaderivestringsbeginningwitha.
atmostoneofalphaandbetacanderiveemptystring.
ifbeta=>t,thenalphadoesnotderiveanystringbeginningwithaterminalinFOLLOWA.

BottomupParsing
Bottomupparsingstartsfromtheleafnodesofatreeandworksinupwarddirectiontillitreachestheroot
node.Here,westartfromasentenceandthenapplyproductionrulesinreversemannerinordertoreach
thestartsymbol.Theimagegivenbelowdepictsthebottomupparsersavailable.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

34/61

11/29/2015

CompilerDesignQuickGuide

ShiftReduceParsing
Shiftreduceparsingusestwouniquestepsforbottomupparsing.Thesestepsareknownasshiftstepand
reducestep.
Shiftstep:Theshiftstepreferstotheadvancementoftheinputpointertothenextinputsymbol,
whichiscalledtheshiftedsymbol.Thissymbolispushedontothestack.Theshiftedsymbolis
treatedasasinglenodeoftheparsetree.
Reducestep:WhentheparserfindsacompletegrammarruleRHSandreplacesittoLHS,itis
knownasreducestep.Thisoccurswhenthetopofthestackcontainsahandle.Toreduce,aPOP
functionisperformedonthestackwhichpopsoffthehandleandreplacesitwithLHSnonterminal
symbol.

LRParser
TheLRparserisanonrecursive,shiftreduce,bottomupparser.Itusesawideclassofcontextfree
grammarwhichmakesitthemostefficientsyntaxanalysistechnique.LRparsersarealsoknownasLRk
parsers,whereLstandsforlefttorightscanningoftheinputstreamRstandsfortheconstructionof
rightmostderivationinreverse,andkdenotesthenumberoflookaheadsymbolstomakedecisions.
TherearethreewidelyusedalgorithmsavailableforconstructinganLRparser:
SLR1SimpleLRParser:
Worksonsmallestclassofgrammar
Fewnumberofstates,henceverysmalltable
Simpleandfastconstruction
LR1LRParser:
WorksoncompletesetofLR1Grammar

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

35/61

11/29/2015

CompilerDesignQuickGuide

Generateslargetableandlargenumberofstates
Slowconstruction
LALR1LookAheadLRParser:
Worksonintermediatesizeofgrammar
NumberofstatesaresameasinSLR1

LRParsingAlgorithm
HerewedescribeaskeletonalgorithmofanLRparser:
token=next_token()
repeatforever
s=topofstack
ifaction[s,token]=shiftsithen
PUSHtoken
PUSHsi
token=next_token()
elseifaction[s,tpken]=reduceA::=then
POP2*||symbols
s=topofstack
PUSHA

PUSHgoto[s,A]
elseifaction[s,token]=acceptthen

return

else
error()

LLvs.LR
LL

LR

Doesaleftmostderivation.

Doesarightmostderivationinreverse.

Startswiththerootnonterminalonthestack.

Endswiththerootnonterminalonthestack.

Endswhenthestackisempty.

Startswithanemptystack.

Usesthestackfordesignatingwhatisstilltobe
expected.

Usesthestackfordesignatingwhatisalreadyseen.

Buildstheparsetreetopdown.

Buildstheparsetreebottomup.

Continuouslypopsanonterminaloffthestack,
andpushesthecorrespondingrighthandside.

Triestorecognizearighthandsideonthestack,
popsit,andpushesthecorrespondingnonterminal.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

36/61

11/29/2015

CompilerDesignQuickGuide

Expandsthenonterminals.

Reducesthenonterminals.

Readstheterminalswhenitpopsoneoffthe
stack.

Readstheterminalswhileitpushesthemonthe
stack.

Preordertraversaloftheparsetree.

Postordertraversaloftheparsetree.

COMPILERDESIGNRUNTIMEENVIRONMENT
Aprogramasasourcecodeismerelyacollectionoftextcode,statementsetc.andtomakeitalive,it
requiresactionstobeperformedonthetargetmachine.Aprogramneedsmemoryresourcestoexecute
instructions.Aprogramcontainsnamesforprocedures,identifiersetc.,thatrequiremappingwiththe
actualmemorylocationatruntime.
Byruntime,wemeanaprograminexecution.Runtimeenvironmentisastateofthetargetmachine,which
mayincludesoftwarelibraries,environmentvariables,etc.,toprovideservicestotheprocessesrunningin
thesystem.
Runtimesupportsystemisapackage,mostlygeneratedwiththeexecutableprogramitselfandfacilitates
theprocesscommunicationbetweentheprocessandtheruntimeenvironment.Ittakescareofmemory
allocationanddeallocationwhiletheprogramisbeingexecuted.

ActivationTrees
Aprogramisasequenceofinstructionscombinedintoanumberofprocedures.Instructionsinaprocedure
areexecutedsequentially.Aprocedurehasastartandanenddelimiterandeverythinginsideitiscalledthe
bodyoftheprocedure.Theprocedureidentifierandthesequenceoffiniteinstructionsinsideitmakeup
thebodyoftheprocedure.
Theexecutionofaprocedureiscalleditsactivation.Anactivationrecordcontainsallthenecessary
informationrequiredtocallaprocedure.Anactivationrecordmaycontainthefollowingunitsdepending
uponthesourcelanguageused.

Temporaries

Storestemporaryandintermediatevaluesofanexpression.

LocalData

Storeslocaldataofthecalledprocedure.

MachineStatus

StoresmachinestatussuchasRegisters,ProgramCounteretc.,beforethe
procedureiscalled.

ControlLink

Storestheaddressofactivationrecordofthecallerprocedure.

AccessLink

Storestheinformationofdatawhichisoutsidethelocalscope.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

37/61

11/29/2015

CompilerDesignQuickGuide

ActualParameters

Storesactualparameters,i.e.,parameterswhichareusedtosendinputtothe
calledprocedure.

ReturnValue

Storesreturnvalues.

Wheneveraprocedureisexecuted,itsactivationrecordisstoredonthestack,alsoknownascontrolstack.
Whenaprocedurecallsanotherprocedure,theexecutionofthecallerissuspendeduntilthecalled
procedurefinishesexecution.Atthistime,theactivationrecordofthecalledprocedureisstoredonthe
stack.
Weassumethattheprogramcontrolflowsinasequentialmannerandwhenaprocedureiscalled,its
controlistransferredtothecalledprocedure.Whenacalledprocedureisexecuted,itreturnsthecontrol
backtothecaller.Thistypeofcontrolflowmakesiteasiertorepresentaseriesofactivationsintheformof
atree,knownastheactivationtree.
Tounderstandthisconcept,wetakeapieceofcodeasanexample:
...
printf(EnterYourName:);
scanf(%s,username);
show_data(username);
printf(Pressanykeytocontinue);
...
intshow_data(char*user)
{
printf(Yournameis%s,username);
return0;
}
...

Belowistheactivationtreeofthecodegiven.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

38/61

11/29/2015

CompilerDesignQuickGuide

Nowweunderstandthatproceduresareexecutedindepthfirstmanner,thusstackallocationisthebest
suitableformofstorageforprocedureactivations.

StorageAllocation
Runtimeenvironmentmanagesruntimememoryrequirementsforthefollowingentities:
Code:Itisknownasthetextpartofaprogramthatdoesnotchangeatruntime.Itsmemory
requirementsareknownatthecompiletime.
Procedures:Theirtextpartisstaticbuttheyarecalledinarandommanner.Thatiswhy,stack
storageisusedtomanageprocedurecallsandactivations.
Variables:Variablesareknownattheruntimeonly,unlesstheyareglobalorconstant.Heap
memoryallocationschemeisusedformanagingallocationanddeallocationofmemoryfor
variablesinruntime.

StaticAllocation
Inthisallocationscheme,thecompilationdataisboundtoafixedlocationinthememoryanditdoesnot
changewhentheprogramexecutes.Asthememoryrequirementandstoragelocationsareknownin
advance,runtimesupportpackageformemoryallocationanddeallocationisnotrequired.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

39/61

11/29/2015

CompilerDesignQuickGuide

StackAllocation
Procedurecallsandtheiractivationsaremanagedbymeansofstackmemoryallocation.Itworksinlastin
firstoutLIFOmethodandthisallocationstrategyisveryusefulforrecursiveprocedurecalls.

HeapAllocation
Variableslocaltoaprocedureareallocatedanddeallocatedonlyatruntime.Heapallocationisusedto
dynamicallyallocatememorytothevariablesandclaimitbackwhenthevariablesarenomorerequired.
Exceptstaticallyallocatedmemoryarea,bothstackandheapmemorycangrowandshrinkdynamicallyand
unexpectedly.Therefore,theycannotbeprovidedwithafixedamountofmemoryinthesystem.

Asshownintheimageabove,thetextpartofthecodeisallocatedafixedamountofmemory.Stackand
heapmemoryarearrangedattheextremesoftotalmemoryallocatedtotheprogram.Bothshrinkand

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

40/61

11/29/2015

CompilerDesignQuickGuide

growagainsteachother.

ParameterPassing
Thecommunicationmediumamongproceduresisknownasparameterpassing.Thevaluesofthevariables
fromacallingprocedurearetransferredtothecalledprocedurebysomemechanism.Beforemoving
ahead,firstgothroughsomebasicterminologiespertainingtothevaluesinaprogram.

rvalue
Thevalueofanexpressioniscalleditsrvalue.Thevaluecontainedinasinglevariablealsobecomesanr
valueifitappearsontherighthandsideoftheassignmentoperator.rvaluescanalwaysbeassignedto
someothervariable.

lvalue
Thelocationofmemoryaddresswhereanexpressionisstoredisknownasthelvalueofthatexpression.It
alwaysappearsatthelefthandsideofanassignmentoperator.
Forexample:
day=1;
week=day*7;
month=1;
year=month*12;

Fromthisexample,weunderstandthatconstantvalueslike1,7,12,andvariableslikeday,week,monthand
year,allhavervalues.Onlyvariableshavelvaluesastheyalsorepresentthememorylocationassignedto
them.
Forexample:
7=x+y;

isanlvalueerror,astheconstant7doesnotrepresentanymemorylocation.

FormalParameters
Variablesthattaketheinformationpassedbythecallerprocedurearecalledformalparameters.These
variablesaredeclaredinthedefinitionofthecalledfunction.

ActualParameters
Variableswhosevaluesoraddressesarebeingpassedtothecalledprocedurearecalledactualparameters.
Thesevariablesarespecifiedinthefunctioncallasarguments.
Example:
fun_one()

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

41/61

11/29/2015

CompilerDesignQuickGuide

{
intactual_parameter=10;
callfun_two(intactual_parameter);
}
fun_two(intformal_parameter)
{
printformal_parameter;
}

Formalparametersholdtheinformationoftheactualparameter,dependingupontheparameterpassing
techniqueused.Itmaybeavalueoranaddress.

PassbyValue
Inpassbyvaluemechanism,thecallingprocedurepassesthervalueofactualparametersandthecompiler
putsthatintothecalledproceduresactivationrecord.Formalparametersthenholdthevaluespassedby
thecallingprocedure.Ifthevaluesheldbytheformalparametersarechanged,itshouldhavenoimpacton
theactualparameters.

PassbyReference
Inpassbyreferencemechanism,thelvalueoftheactualparameteriscopiedtotheactivationrecordofthe
calledprocedure.Thisway,thecalledprocedurenowhastheaddressmemorylocationoftheactual
parameterandtheformalparameterreferstothesamememorylocation.Therefore,ifthevaluepointed
bytheformalparameterischanged,theimpactshouldbeseenontheactualparameterastheyshouldalso
pointtothesamevalue.

PassbyCopyrestore
Thisparameterpassingmechanismworkssimilartopassbyreferenceexceptthatthechangestoactual
parametersaremadewhenthecalledprocedureends.Uponfunctioncall,thevaluesofactualparameters
arecopiedintheactivationrecordofthecalledprocedure.Formalparametersifmanipulatedhavenoreal
timeeffectonactualparametersaslvaluesarepassed,butwhenthecalledprocedureends,thelvaluesof
formalparametersarecopiedtothelvaluesofactualparameters.
Example:
inty;
calling_procedure()
{
y=10;
copy_restore(y);//lvalueofyispassed
printfy;//prints99
}
copy_restore(intx)
{
x=99;//ystillhasvalue10(unaffected)
y=0;//yisnow0
}

Whenthisfunctionends,thelvalueofformalparameterxiscopiedtotheactualparametery.Evenifthe

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

42/61

11/29/2015

CompilerDesignQuickGuide

valueofyischangedbeforetheprocedureends,thelvalueofxiscopiedtothelvalueofymakingit
behavelikecallbyreference.

PassbyName
LanguageslikeAlgolprovideanewkindofparameterpassingmechanismthatworkslikepreprocessorinC
language.Inpassbynamemechanism,thenameoftheprocedurebeingcalledisreplacedbyitsactual
body.Passbynametextuallysubstitutestheargumentexpressionsinaprocedurecallforthe
correspondingparametersinthebodyoftheproceduresothatitcannowworkonactualparameters,
muchlikepassbyreference.

COMPILERDESIGNSYMBOLTABLE
Symboltableisanimportantdatastructurecreatedandmaintainedbycompilersinordertostore
informationabouttheoccurrenceofvariousentitiessuchasvariablenames,functionnames,objects,
classes,interfaces,etc.Symboltableisusedbyboththeanalysisandthesynthesispartsofacompiler.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
Tostorethenamesofallentitiesinastructuredformatoneplace.
Toverifyifavariablehasbeendeclared.
Toimplementtypechecking,byverifyingassignmentsandexpressionsinthesourcecodeare
semanticallycorrect.
Todeterminethescopeofanamescoperesolution.
Asymboltableissimplyatablewhichcanbeeitherlinearorahashtable.Itmaintainsanentryforeach
nameinthefollowingformat:
<symbolname,type,attribute>

Forexample,ifasymboltablehastostoreinformationaboutthefollowingvariabledeclaration:
staticintinterest;

thenitshouldstoretheentrysuchas:
<interest,int,static>

Theattributeclausecontainstheentriesrelatedtothename.

Implementation
Ifacompileristohandleasmallamountofdata,thenthesymboltablecanbeimplementedasan
unorderedlist,whichiseasytocode,butitisonlysuitableforsmalltablesonly.Asymboltablecanbe
implementedinoneofthefollowingways:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

43/61

11/29/2015

CompilerDesignQuickGuide

Linearsortedorunsortedlist
BinarySearchTree
Hashtable
Amongall,symboltablesaremostlyimplementedashashtables,wherethesourcecodesymbolitselfis
treatedasakeyforthehashfunctionandthereturnvalueistheinformationaboutthesymbol.

Operations
Asymboltable,eitherlinearorhash,shouldprovidethefollowingoperations.

insert
Thisoperationismorefrequentlyusedbyanalysisphase,i.e.,thefirsthalfofthecompilerwheretokens
areidentifiedandnamesarestoredinthetable.Thisoperationisusedtoaddinformationinthesymbol
tableaboutuniquenamesoccurringinthesourcecode.Theformatorstructureinwhichthenamesare
storeddependsuponthecompilerinhand.
Anattributeforasymbolinthesourcecodeistheinformationassociatedwiththatsymbol.This
informationcontainsthevalue,state,scope,andtypeaboutthesymbol.Theinsertfunctiontakesthe
symbolanditsattributesasargumentsandstorestheinformationinthesymboltable.
Forexample:
inta;

shouldbeprocessedbythecompileras:
insert(a,int);

lookup
lookupoperationisusedtosearchanameinthesymboltabletodetermine:
ifthesymbolexistsinthetable.
ifitisdeclaredbeforeitisbeingused.
ifthenameisusedinthescope.
ifthesymbolisinitialized.
ifthesymboldeclaredmultipletimes.
Theformatoflookupfunctionvariesaccordingtotheprogramminglanguage.Thebasicformatshould
matchthefollowing:
lookup(symbol)

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

44/61

11/29/2015

CompilerDesignQuickGuide

Thismethodreturns0zeroifthesymboldoesnotexistinthesymboltable.Ifthesymbolexistsinthe
symboltable,itreturnsitsattributesstoredinthetable.

ScopeManagement
Acompilermaintainstwotypesofsymboltables:aglobalsymboltablewhichcanbeaccessedbyallthe
proceduresandscopesymboltablesthatarecreatedforeachscopeintheprogram.
Todeterminethescopeofaname,symboltablesarearrangedinhierarchicalstructureasshowninthe
examplebelow:
...
intvalue=10;
voidpro_one()
{
intone_1;
intone_2;

{\
intone_3;|_innerscope1
intone_4;|
}/

intone_5;

{\
intone_6;|_innerscope2
intone_7;|
}/
}

voidpro_two()
{
inttwo_1;
inttwo_2;

{\
inttwo_3;|_innerscope3
inttwo_4;|
}/

inttwo_5;
}
...

Theaboveprogramcanberepresentedinahierarchicalstructureofsymboltables:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

45/61

11/29/2015

CompilerDesignQuickGuide

Theglobalsymboltablecontainsnamesforoneglobalvariableintvalueandtwoprocedurenames,which
shouldbeavailabletoallthechildnodesshownabove.Thenamesmentionedinthepro_onesymboltable
andallitschildtablesarenotavailableforpro_twosymbolsanditschildtables.
Thissymboltabledatastructurehierarchyisstoredinthesemanticanalyzerandwheneveranameneedsto
besearchedinasymboltable,itissearchedusingthefollowingalgorithm:
firstasymbolwillbesearchedinthecurrentscope,i.e.currentsymboltable.
ifanameisfound,thensearchiscompleted,elseitwillbesearchedintheparentsymboltableuntil,
eitherthenameisfoundorglobalsymboltablehasbeensearchedforthename.

COMPILERINTERMEDIATECODEGENERATION
Asourcecodecandirectlybetranslatedintoitstargetmachinecode,thenwhyatallweneedtotranslate
thesourcecodeintoanintermediatecodewhichisthentranslatedtoitstargetcode?Letusseethereasons
whyweneedanintermediatecode.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

46/61

11/29/2015

CompilerDesignQuickGuide

Ifacompilertranslatesthesourcelanguagetoitstargetmachinelanguagewithouthavingtheoption
forgeneratingintermediatecode,thenforeachnewmachine,afullnativecompilerisrequired.
Intermediatecodeeliminatestheneedofanewfullcompilerforeveryuniquemachinebykeeping
theanalysisportionsameforallthecompilers.
Thesecondpartofcompiler,synthesis,ischangedaccordingtothetargetmachine.
Itbecomeseasiertoapplythesourcecodemodificationstoimprovecodeperformancebyapplying
codeoptimizationtechniquesontheintermediatecode.

IntermediateRepresentation
Intermediatecodescanberepresentedinavarietyofwaysandtheyhavetheirownbenefits.
HighLevelIRHighlevelintermediatecoderepresentationisveryclosetothesourcelanguage
itself.Theycanbeeasilygeneratedfromthesourcecodeandwecaneasilyapplycodemodifications
toenhanceperformance.Butfortargetmachineoptimization,itislesspreferred.
LowLevelIRThisoneisclosetothetargetmachine,whichmakesitsuitableforregisterand
memoryallocation,instructionsetselection,etc.Itisgoodformachinedependentoptimizations.
Intermediatecodecanbeeitherlanguagespecifice.g.,ByteCodeforJavaorlanguageindependentthree
addresscode.

ThreeAddressCode
Intermediatecodegeneratorreceivesinputfromitspredecessorphase,semanticanalyzer,intheformof
anannotatedsyntaxtree.Thatsyntaxtreethencanbeconvertedintoalinearrepresentation,e.g.,postfix
notation.Intermediatecodetendstobemachineindependentcode.Therefore,codegeneratorassumesto
haveunlimitednumberofmemorystorageregistertogeneratecode.
Forexample:
a=b+c*d;

Theintermediatecodegeneratorwilltrytodividethisexpressionintosubexpressionsandthengenerate
thecorrespondingcode.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

47/61

11/29/2015

CompilerDesignQuickGuide

r1=c*d;
r2=b+r1;
r3=r2+r1;
a=r3

rbeingusedasregistersinthetargetprogram.
Athreeaddresscodehasatmostthreeaddresslocationstocalculatetheexpression.Athreeaddresscode
canberepresentedintwoforms:quadruplesandtriples.

Quadruples
Eachinstructioninquadruplespresentationisdividedintofourfields:operator,arg1,arg2,andresult.The
aboveexampleisrepresentedbelowinquadruplesformat:

Op

arg1

arg2

result

r1

r1

r2

r2

r1

r3

r3

Triples
Eachinstructionintriplespresentationhasthreefields:op,arg1,andarg2.Theresultsofrespectivesub
expressionsaredenotedbythepositionofexpression.TriplesrepresentsimilaritywithDAGandsyntax
tree.TheyareequivalenttoDAGwhilerepresentingexpressions.

Op

arg1

arg2

Triplesfacetheproblemofcodeimmovabilitywhileoptimization,astheresultsarepositionaland

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

48/61

11/29/2015

CompilerDesignQuickGuide

changingtheorderorpositionofanexpressionmaycauseproblems.

IndirectTriples
Thisrepresentationisanenhancementovertriplesrepresentation.Itusespointersinsteadofpositionto
storeresults.Thisenablestheoptimizerstofreelyrepositionthesubexpressiontoproduceanoptimized
code.

Declarations
Avariableorprocedurehastobedeclaredbeforeitcanbeused.Declarationinvolvesallocationofspacein
memoryandentryoftypeandnameinthesymboltable.Aprogrammaybecodedanddesignedkeeping
thetargetmachinestructureinmind,butitmaynotalwaysbepossibletoaccuratelyconvertasourcecode
toitstargetlanguage.
Takingthewholeprogramasacollectionofproceduresandsubprocedures,itbecomespossibletodeclare
allthenameslocaltotheprocedure.Memoryallocationisdoneinaconsecutivemannerandnamesare
allocatedtomemoryinthesequencetheyaredeclaredintheprogram.Weuseoffsetvariableandsetitto
zero{offset=0}thatdenotethebaseaddress.
Thesourceprogramminglanguageandthetargetmachinearchitecturemayvaryinthewaynamesare
stored,sorelativeaddressingisused.Whilethefirstnameisallocatedmemorystartingfromthememory
location0{offset=0},thenextnamedeclaredlater,shouldbeallocatedmemorynexttothefirstone.
Example:
WetaketheexampleofCprogramminglanguagewhereanintegervariableisassigned2bytesofmemory
andafloatvariableisassigned4bytesofmemory.
inta;
floatb;
Allocationprocess:
{offset=0}
inta;
id.type=int
id.width=2
offset=offset+id.width
{offset=2}
floatb;
id.type=float
id.width=4
offset=offset+id.width
{offset=6}

Toenterthisdetailinasymboltable,aprocedureentercanbeused.Thismethodmayhavethefollowing
structure:
enter(name,type,offset)

Thisprocedureshouldcreateanentryinthesymboltable,forvariablename,havingitstypesettotype

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

49/61

11/29/2015

CompilerDesignQuickGuide

andrelativeaddressoffsetinitsdataarea.

COMPILERDESIGNCODEGENERATION
Codegenerationcanbeconsideredasthefinalphaseofcompilation.Throughpostcodegeneration,
optimizationprocesscanbeappliedonthecode,butthatcanbeseenasapartofcodegenerationphase
itself.Thecodegeneratedbythecompilerisanobjectcodeofsomelowerlevelprogramminglanguage,for
example,assemblylanguage.Wehaveseenthatthesourcecodewritteninahigherlevellanguageis
transformedintoalowerlevellanguagethatresultsinalowerlevelobjectcode,whichshouldhavethe
followingminimumproperties:
Itshouldcarrytheexactmeaningofthesourcecode.
ItshouldbeefficientintermsofCPUusageandmemorymanagement.
Wewillnowseehowtheintermediatecodeistransformedintotargetobjectcodeassemblycode,inthis
case.

DirectedAcyclicGraph
DirectedAcyclicGraphDAGisatoolthatdepictsthestructureofbasicblocks,helpstoseetheflowof
valuesflowingamongthebasicblocks,andoffersoptimizationtoo.DAGprovideseasytransformationon
basicblocks.DAGcanbeunderstoodhere:
Leafnodesrepresentidentifiers,namesorconstants.
Interiornodesrepresentoperators.
Interiornodesalsorepresenttheresultsofexpressionsortheidentifiers/namewherethevaluesare
tobestoredorassigned.
Example:
t0=a+b
t1=t0+c
d=t0+t1

[t0=a+b]

[t1=t0+c]

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

50/61

11/29/2015

CompilerDesignQuickGuide

[d=t0+t1]

PeepholeOptimization
Thisoptimizationtechniqueworkslocallyonthesourcecodetotransformitintoanoptimizedcode.By
locally,wemeanasmallportionofthecodeblockathand.Thesemethodscanbeappliedonintermediate
codesaswellasontargetcodes.Abunchofstatementsisanalyzedandarecheckedforthefollowing
possibleoptimization:

Redundantinstructionelimination
Atsourcecodelevel,thefollowingcanbedonebytheuser:

intadd_ten(intx)
{
inty,z;
y=10;
z=x+y;
returnz;
}

intadd_ten(intx)
{
inty;
y=10;
y=x+y;
returny;
}

intadd_ten(intx)
{
inty=10;
returnx+y;
}

intadd_ten(intx)
{
returnx+10;
}

Atcompilationlevel,thecompilersearchesforinstructionsredundantinnature.Multipleloadingand
storingofinstructionsmaycarrythesamemeaningevenifsomeofthemareremoved.Forexample:
MOVx,R0
MOVR0,R1
Wecandeletethefirstinstructionandrewritethesentenceas:

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

51/61

11/29/2015

CompilerDesignQuickGuide

MOVx,R1

Unreachablecode
Unreachablecodeisapartoftheprogramcodethatisneveraccessedbecauseofprogrammingconstructs.
Programmersmayhaveaccidentlywrittenapieceofcodethatcanneverbereached.
Example:
voidadd_ten(intx)
{
returnx+10;
printf(valueofxis%d,x);
}

Inthiscodesegment,theprintfstatementwillneverbeexecutedastheprogramcontrolreturnsback
beforeitcanexecute,henceprintfcanberemoved.

Flowofcontroloptimization
Thereareinstancesinacodewheretheprogramcontroljumpsbackandforthwithoutperformingany
significanttask.Thesejumpscanberemoved.Considerthefollowingchunkofcode:
...

MOVR1,R2
GOTOL1
...
L1:GOTOL2
L2:INCR1

Inthiscode,labelL1canberemovedasitpassesthecontroltoL2.SoinsteadofjumpingtoL1andthento
L2,thecontrolcandirectlyreachL2,asshownbelow:
...

MOVR1,R2
GOTOL2
...
L2:INCR1

Algebraicexpressionsimplification
Thereareoccasionswherealgebraicexpressionscanbemadesimple.Forexample,theexpressiona=a+
0canbereplacedbyaitselfandtheexpressiona=a+1cansimplybereplacedbyINCa.

Strengthreduction
Thereareoperationsthatconsumemoretimeandspace.Theirstrengthcanbereducedbyreplacingthem
withotheroperationsthatconsumelesstimeandspace,butproducethesameresult.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

52/61

11/29/2015

CompilerDesignQuickGuide

Forexample,x*2canbereplacedbyx<<1,whichinvolvesonlyoneleftshift.Thoughtheoutputofa*a
anda2 issame,a2 ismuchmoreefficienttoimplement.

Accessingmachineinstructions
Thetargetmachinecandeploymoresophisticatedinstructions,whichcanhavethecapabilitytoperform
specificoperationsmuchefficiently.Ifthetargetcodecanaccommodatethoseinstructionsdirectly,that
willnotonlyimprovethequalityofcode,butalsoyieldmoreefficientresults.

CodeGenerator
Acodegeneratorisexpectedtohaveanunderstandingofthetargetmachinesruntimeenvironmentandits
instructionset.Thecodegeneratorshouldtakethefollowingthingsintoconsiderationtogeneratethe
code:
Targetlanguage:Thecodegeneratorhastobeawareofthenatureofthetargetlanguagefor
whichthecodeistobetransformed.Thatlanguagemayfacilitatesomemachinespecificinstructions
tohelpthecompilergeneratethecodeinamoreconvenientway.Thetargetmachinecanhaveeither
CISCorRISCprocessorarchitecture.
IRType:Intermediaterepresentationhasvariousforms.ItcanbeinAbstractSyntaxTreeAST
structure,ReversePolishNotation,or3addresscode.
Selectionofinstruction:ThecodegeneratortakesIntermediateRepresentationasinputand
convertsmapsitintotargetmachinesinstructionset.Onerepresentationcanhavemanyways
instructionstoconvertit,soitbecomestheresponsibilityofthecodegeneratortochoosethe
appropriateinstructionswisely.
Registerallocation:Aprogramhasanumberofvaluestobemaintainedduringtheexecution.
ThetargetmachinesarchitecturemaynotallowallofthevaluestobekeptintheCPUmemoryor
registers.Codegeneratordecideswhatvaluestokeepintheregisters.Also,itdecidestheregistersto
beusedtokeepthesevalues.
Orderingofinstructions:Atlast,thecodegeneratordecidestheorderinwhichtheinstruction
willbeexecuted.Itcreatesschedulesforinstructionstoexecutethem.

Descriptors
Thecodegeneratorhastotrackboththeregistersforavailabilityandaddresseslocationofvalueswhile
generatingthecode.Forbothofthem,thefollowingtwodescriptorsareused:
Registerdescriptor:Registerdescriptorisusedtoinformthecodegeneratoraboutthe
availabilityofregisters.Registerdescriptorkeepstrackofvaluesstoredineachregister.Whenevera
newregisterisrequiredduringcodegeneration,thisdescriptorisconsultedforregisteravailability.
Addressdescriptor:Valuesofthenamesidentifiersusedintheprogrammightbestoredat
differentlocationswhileinexecution.Addressdescriptorsareusedtokeeptrackofmemory
locationswherethevaluesofidentifiersarestored.TheselocationsmayincludeCPUregisters,
heaps,stacks,memoryoracombinationofthementionedlocations.
Codegeneratorkeepsboththedescriptorupdatedinrealtime.Foraloadstatement,LDR1,x,thecode

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

53/61

11/29/2015

CompilerDesignQuickGuide

generator:
updatestheRegisterDescriptorR1thathasvalueofxand
updatestheAddressDescriptorxtoshowthatoneinstanceofxisinR1.

CodeGeneration
Basicblockscompriseofasequenceofthreeaddressinstructions.Codegeneratortakesthesesequenceof
instructionsasinput.
Note:Ifthevalueofanameisfoundatmorethanoneplaceregister,cache,ormemory,theregisters
valuewillbepreferredoverthecacheandmainmemory.Likewisecachesvaluewillbepreferredoverthe
mainmemory.Mainmemoryisbarelygivenanypreference.
getReg:CodegeneratorusesgetRegfunctiontodeterminethestatusofavailableregistersandthe
locationofnamevalues.getRegworksasfollows:
IfvariableYisalreadyinregisterR,itusesthatregister.
ElseifsomeregisterRisavailable,itusesthatregister.
Elseifboththeaboveoptionsarenotpossible,itchoosesaregisterthatrequiresminimalnumberof
loadandstoreinstructions.
Foraninstructionx=yOPz,thecodegeneratormayperformthefollowingactions.LetusassumethatLis
thelocationpreferablyregisterwheretheoutputofyOPzistobesaved:
CallfunctiongetReg,todecidethelocationofL.
DeterminethepresentlocationregisterormemoryofybyconsultingtheAddressDescriptorofy.If
yisnotpresentlyinregisterL,thengeneratethefollowinginstructiontocopythevalueofytoL:
MOVy,L
whereyrepresentsthecopiedvalueofy.
Determinethepresentlocationofzusingthesamemethodusedinstep2foryandgeneratethe
followinginstruction:
OPz,L
wherezrepresentsthecopiedvalueofz.
NowLcontainsthevalueofyOPz,thatisintendedtobeassignedtox.So,ifLisaregister,update
itsdescriptortoindicatethatitcontainsthevalueofx.Updatethedescriptorofxtoindicatethatit
isstoredatlocationL.
Ifyandzhasnofurtheruse,theycanbegivenbacktothesystem.
Othercodeconstructslikeloopsandconditionalstatementsaretransformedintoassemblylanguagein
generalassemblyway.

COMPILERDESIGNCODEOPTIMIZATION
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

54/61

11/29/2015

CompilerDesignQuickGuide

COMPILERDESIGNCODEOPTIMIZATION
Optimizationisaprogramtransformationtechnique,whichtriestoimprovethecodebymakingitconsume
lessresourcesi.e.CPU,Memoryanddeliverhighspeed.
Inoptimization,highlevelgeneralprogrammingconstructsarereplacedbyveryefficientlowlevel
programmingcodes.Acodeoptimizingprocessmustfollowthethreerulesgivenbelow:
Theoutputcodemustnot,inanyway,changethemeaningoftheprogram.
Optimizationshouldincreasethespeedoftheprogramandifpossible,theprogramshoulddemand
lessnumberofresources.
Optimizationshoulditselfbefastandshouldnotdelaytheoverallcompilingprocess.
Effortsforanoptimizedcodecanbemadeatvariouslevelsofcompilingtheprocess.
Atthebeginning,userscanchange/rearrangethecodeorusebetteralgorithmstowritethecode.
Aftergeneratingintermediatecode,thecompilercanmodifytheintermediatecodebyaddress
calculationsandimprovingloops.
Whileproducingthetargetmachinecode,thecompilercanmakeuseofmemoryhierarchyandCPU
registers.
Optimizationcanbecategorizedbroadlyintotwotypes:machineindependentandmachinedependent.

MachineindependentOptimization
Inthisoptimization,thecompilertakesintheintermediatecodeandtransformsapartofthecodethat
doesnotinvolveanyCPUregistersand/orabsolutememorylocations.Forexample:
do
{
item=10;
value=value+item;
}while(value<100);

Thiscodeinvolvesrepeatedassignmentoftheidentifieritem,whichifweputthisway:
Item=10;
do
{
value=value+item;
}while(value<100);

shouldnotonlysavetheCPUcycles,butcanbeusedonanyprocessor.

MachinedependentOptimization

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

55/61

11/29/2015

CompilerDesignQuickGuide

Machinedependentoptimizationisdoneafterthetargetcodehasbeengeneratedandwhenthecodeis
transformedaccordingtothetargetmachinearchitecture.ItinvolvesCPUregistersandmayhaveabsolute
memoryreferencesratherthanrelativereferences.Machinedependentoptimizersputeffortstotake
maximumadvantageofmemoryhierarchy.

BasicBlocks
Sourcecodesgenerallyhaveanumberofinstructions,whicharealwaysexecutedinsequenceandare
consideredasthebasicblocksofthecode.Thesebasicblocksdonothaveanyjumpstatementsamong
them,i.e.,whenthefirstinstructionisexecuted,alltheinstructionsinthesamebasicblockwillbeexecuted
intheirsequenceofappearancewithoutlosingtheflowcontroloftheprogram.
Aprogramcanhavevariousconstructsasbasicblocks,likeIFTHENELSE,SWITCHCASEconditional
statementsandloopssuchasDOWHILE,FOR,andREPEATUNTIL,etc.

Basicblockidentification
Wemayusethefollowingalgorithmtofindthebasicblocksinaprogram:
Searchheaderstatementsofallthebasicblocksfromwhereabasicblockstarts:
Firststatementofaprogram.
Statementsthataretargetofanybranchconditional/unconditional.
Statementsthatfollowanybranchstatement.
Headerstatementsandthestatementsfollowingthemformabasicblock.
Abasicblockdoesnotincludeanyheaderstatementofanyotherbasicblock.
Basicblocksareimportantconceptsfrombothcodegenerationandoptimizationpointofview.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

56/61

11/29/2015

CompilerDesignQuickGuide

Loading[MathJax]/jax/element/mml/optable/GeneralPunctuation.js

Basicblocksplayanimportantroleinidentifyingvariables,whicharebeingusedmorethanonceinasingle
basicblock.Ifanyvariableisbeingusedmorethanonce,theregistermemoryallocatedtothatvariable
neednotbeemptiedunlesstheblockfinishesexecution.

ControlFlowGraph
Basicblocksinaprogramcanberepresentedbymeansofcontrolflowgraphs.Acontrolflowgraphdepicts
howtheprogramcontrolisbeingpassedamongtheblocks.Itisausefultoolthathelpsinoptimizationby
helplocatinganyunwantedloopsintheprogram.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

57/61

11/29/2015

CompilerDesignQuickGuide

LoopOptimization
Mostprogramsrunasaloopinthesystem.Itbecomesnecessarytooptimizetheloopsinordertosave
CPUcyclesandmemory.Loopscanbeoptimizedbythefollowingtechniques:
Invariantcode:Afragmentofcodethatresidesintheloopandcomputesthesamevalueateach
iterationiscalledaloopinvariantcode.Thiscodecanbemovedoutoftheloopbysavingittobe
computedonlyonce,ratherthanwitheachiteration.
Inductionanalysis:Avariableiscalledaninductionvariableifitsvalueisalteredwithintheloop
byaloopinvariantvalue.
Strengthreduction:ThereareexpressionsthatconsumemoreCPUcycles,time,andmemory.
Theseexpressionsshouldbereplacedwithcheaperexpressionswithoutcompromisingtheoutputof
expression.Forexample,multiplicationx*2isexpensiveintermsofCPUcyclesthanx<<1and
yieldsthesameresult.

DeadcodeElimination
Deadcodeisoneormorethanonecodestatements,whichare:
Eitherneverexecutedorunreachable,
Orifexecuted,theiroutputisneverused.
Thus,deadcodeplaysnoroleinanyprogramoperationandthereforeitcansimplybeeliminated.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

58/61

11/29/2015

CompilerDesignQuickGuide

Partiallydeadcode
Therearesomecodestatementswhosecomputedvaluesareusedonlyundercertaincircumstances,i.e.,
sometimesthevaluesareusedandsometimestheyarenot.Suchcodesareknownaspartiallydeadcode.

Theabovecontrolflowgraphdepictsachunkofprogramwherevariableaisusedtoassigntheoutputof
expressionx*y.Letusassumethatthevalueassignedtoaisneverusedinsidetheloop.Immediately
afterthecontrolleavestheloop,aisassignedthevalueofvariablez,whichwouldbeusedlaterinthe
program.Weconcludeherethattheassignmentcodeofaisneverusedanywhere,thereforeitiseligible
tobeeliminated.

Likewise,thepictureabovedepictsthattheconditionalstatementisalwaysfalse,implyingthatthecode,
writtenintruecase,willneverbeexecuted,henceitcanberemoved.

PartialRedundancy
Redundantexpressionsarecomputedmorethanonceinparallelpath,withoutanychangein
operands.whereaspartialredundantexpressionsarecomputedmorethanonceinapath,withoutany
changeinoperands.Forexample,

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

59/61

11/29/2015

CompilerDesignQuickGuide

[redundantexpression]

[partiallyredundantexpression]

Loopinvariantcodeispartiallyredundantandcanbeeliminatedbyusingacodemotiontechnique.
Anotherexampleofapartiallyredundantcodecanbe:
If(condition)
{
a=yOPz;
}
else
{
...
}
c=yOPz;

Weassumethatthevaluesofoperands(yandz)arenotchangedfromassignmentofvariableatovariable
c.Here,iftheconditionstatementistrue,thenyOPziscomputedtwice,otherwiseonce.Codemotioncan
beusedtoeliminatethisredundancy,asshownbelow:
If(condition)
{
...
tmp=yOPz;
a=tmp;
...
}
else
{
...
tmp=yOPz;

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

60/61

11/29/2015

CompilerDesignQuickGuide

}
c=tmp;

Here,whethertheconditionistrueorfalseyOPzshouldbecomputedonlyonce.

file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm

61/61