Unit 6 Compilers

Introduction
 A compiler is a program that reads a program written in one

language called the source language and translates it into an equivalent program in another language called the target language.
Source program

Compiler

Target program

Error messages

 There are two parts of compilation: analysis and synthesis
Analysis: creates intermediate representation of SP Synthesis: constructs the desired target program

Phases of a Compiler
 Compiler operates
Source program

in phases

Lexical analyzer
Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator target program

Symbol table manager

Error handler

 The stream of characters are read from left-to-right and grouped into tokens.Phases of a Compiler  Lexical analyzer:  Performs lexical analysis also known as linear analysis or scanning. position = initial + rate * 60 The foll. Tokens are formed:  The identifier position  The assignment symbol  The identifier initial  Plus sign  The identifier rate  Multiplication sign  Number 60 .  The white spaces are eliminated during lexical analysis  For eg..

It involves grouping of tokens of SP into grammatical phrases.Phases of a Compiler  Syntax analyzer: Performs syntax analysis also known as hierarchical analysis or parsing. Then they are represented by a parse tree. .

Phases of a Compiler Assignment statement identifier position = expression expression + expression * expression number 60 identifier expression initial identifier rate Parse tree for position = initial + rate * 60 .

Phases of a Compiler = position initial + * rate 60 Syntax tree for position = initial + rate * 60  Syntax tree is a compressed representation of a parse tree.  The operators appear in the interior nodes and the operands of an operator are the children of the node for that operator. .

Important component is type checking = position + initial rate * inttoreal 60 .Phases of a Compiler  Semantic analyzer: Performs semantic analysis It involves checking the SP for semantic errors and gathers type information.

 It can be in different forms:  One such form is “three-address code”  It is a like assembly language  Three address code consists of a sequence of instructions each of which has at most 3 operands  id1 = id2 +id3*60 temp1 = inttoreal(60) temp2 = id3*temp1 temp3 = id2+temp2 id1 = temp3 .Phases of a Compiler  Intermediate code generation:  intermediate code must have 2 properties: easy to produce and easy to translate.

Phases of a Compiler
 Code optimizer:  Attempts to improve the intermediate code

temp1 = inttoreal(60) temp2 = id3*temp1 temp3 = id2+temp2 id1 = temp3
Optimized to

temp1 = id3 * 60 id1 = id2 + temp1

Phases of a Compiler
 Code generator:  Deals with generation of target code, consisting of relocatable

machine code or assembly code. MOVF MULF MOVF ADDF MOVF
Floating point

id3,R2 #60.0,R2 id2,R1 R2,R1 R1,id1
Source

‘#’ treated as constant

Destination

Phases of a Compiler
 Symbol table management: A symbol table is a data structure containing a record for each

identifier with fields for the attributes of the identifier. It allows us to find the record for each identifier and to store r retrieve data from that record.
 Error detection and Reporting: Each phase can encounter errors Each phase must deal with that errors

Lexical Analyzer  The lexical analyzer is the first phase of compiler  Its main task is to read the input characters and produces output a sequence of tokens that the parser uses for syntax analysis Source program Lexical analyser token get next token Parser Symbol table Interaction of lexical analyzer with parser .

.  A lexeme is a sequence of characters in the source program that is matched by the pattern for the token.eg. identifier  A pattern is a description of the form that the lexemes of a token may take.1416. .Lexical Analyzer  Tokens.  A token is an abstract symbol representing a kind of lexical unit. const pi = 3.  For example in the statement. a keyword. Patterns and Lexemes. the substring pi is a lexeme for the token identifier.

and semicolons.  In most programming languages. constants. and punctuation symbols such as parentheses. both printf and score are lexemes matching the pattern for token id. printf(“Total=%d\n”. operators.  Another example. and “Total=%d\n” is a lexeme matching literal. .Lexical Analyzer  Tokens Patterns and Lexemes. commas. the following constructs are treated as tokens: keywords. identifiers. score). a C statement. literal strings.

Lexical Analyzer  Tokens Patterns and Lexemes.count.0.=.1416.<>.<=. TOKEN const if relation id num literal SAMPLE LEXEMES Const if <.6.02E23 “core dumped” INFORMAL DESCRIPTION OF PATTERN const if < or <= or = or <> or >= or > letter followed by letters and digits any numeric constant any characters between “ and “ except” .D2 3.>.>= pi.

Lexical Analyzer: Specification of Tokens
 Regular expressions are an important notation for specifying tokens.  Strings and Languages:  The term alphabet or character class denotes any finite set of symbols

eg., The set {0, 1} is the binary alphabet.
 A String over some alphabet is a finite sequence of symbols drawn

from that alphabet
 The term language denotes any set of strings over some fixed alphabet

Lexical Analyzer: Specification of Tokens
 Operations on languages:  There are several important operations like union, concatenation and

closure that can be applied to languages.  For example:  Let L be the set {A,B,….,Z,a,b,…z} and D be the set {0,1,…9} 1. LUD is the set of letters and digits 2. LD is the set of strings consisting of a letter followed by a digit 3. L4 is the set of all four-letter strings 4. L* is the set of all strings, including ε , the empty string 5. L(LUD)* is the set of all strings of letters and digits beginning with a letter. 6. D+ is the set of all strings of one or more digits

Lexical Analyzer: Specification of Tokens
 Regular Expressions:  An identifier is a letter followed by zero or more letters or digits  The expression:

letter (letter | digit)*  The | here means “or” , the parentheses are used to group sub expressions, the star means “ zero or more instances of” the parenthesized expression, and the juxtaposition of letter with remainder of the expression means concatenation.
 A regular expression is built up out of simpler regular expressions

using set of defining rules.
 Each regular expression r denotes a language L(r).

. ε is a RE that denotes {ε}. Suppose r and s are regular expressions denoting the languages L(r) and L(s). the set containing the string a. a) (r)|(s) is a RE denoting L(r) U L(s) b) (r)(s) is a RE denoting L(r)L(s) c) (r)* is a RE denoting (L(r))* d) (r) is a RE denoting (L(r))2  A language denoted by a RE is said to be a regular set.Lexical Analyzer: Specification of Tokens  Regular Expressions RE:  The rules that define the regular expression over alphabet ∑. Then. the set containing the empty string If a is a symbol in ∑. then a is a RE that denotes {a}.

.  The RE a|a*b denotes the set containing the string a and all strings consisting of zero or more a’s followed be b.b}  The RE (a|b) (a|b) denotes {aa.….bb}. .ba.}.aa. the set of all strings of a’s and b’s of length two.a.Lexical Analyzer: Specification of Tokens  Regular Expressions:  Example: Let ∑ = {a.b}  The RE a|b denotes the set {a.  The RE (a|b)* denotes the denotes the set of all strings of zero or more instances of an a or b.  The RE a* denotes the set of all strings of zero or more instances of an a {ε.ab.

d’  r’ where d.Lexical Analyzer: Specification of Tokens  Regular definitions:  If ∑ is an alphabet of basic symbols. then a regular definition is a sequence of definitions of the form d  r .…}  Example: Consider the set of strings of letters and digits beginning with a letter. r’ is a regular expression over the symbols in ∑ U {d. The regular definition for the set is letter  A|B|…|Z|a|b|…z digit  0|1|2|…|9 id  letter ( letter | digit ) * . d’. d’ is a distinct name and r.

Lexical Analyzer: Recognition of Tokens  Consider the following grammar fragment: stmt if expr then stmt |if expr then stmt else stmt |e exprterm relop term |term termid |num where the terminals if .digit+)?(E(+|-)?digit+)? . relop. else. id and num generate sets of strings given by the following regular definitions: if if then ten else else relop <|<=|=|<>|>|>= idletter(letter|digit)* numdigit+ (. then.

Lexical Analyzer: Recognition of Tokens REGULAR EXPRESSION ws if then else id num < <= = <> > >= TOKEN if then else id num relop relop relop relop relop relop ATTIBUTE VALUE Pointer to table entry Pointer to table entry LT LE EQ NE GT GE .

the nodes are the states and the labeled edges represent the transition function.  They are represented by transition graphs.  In these labeled directed graphs.  A finite automaton can be deterministic or non deterministic.Lexical Analyzer: Finite Automata  A recognizer for a language is a program that takes as input a string x and answers ‘yes’ if x is a sentence of the language and ‘no’ otherwise. .

Lexical Analyzer: Finite Automata  Nondeterministic Finite Automata (NFA):A mathematical model consisting of: 1) a set of states S 2) a set of input alphabet ∑ 3) a transition function move that maps state-symbol pairs to set of states 4) a state s0 as start or initial state 5) a set of states F as final or accepting states .

b}  Initial state is 0  Accepting state is 3 indicated by double circle.1.Lexical Analyzer: Finite Automata  Nondeterministic Finite Automata :  Example: the transition graph for an NFA that recognizes the language (a|b)*abb a start 0 a 1 b 2 b 3 b  Set of states S: {0.2.3}  Input symbol alphabet ∑ = {a. .

.1} - in the table is the set of states that can be reached by a transition from state i on input a.Lexical Analyzer: Finite Automata  Nondeterministic Finite Automata :  Transition table:  Row represents each state  Column for each input symbol  The entry for row i and symbol a State Input symbols a b {0} {2} {3} 0 1 2 {0.

it becomes very easy to determine whether a DFA accepts an input string . and 2) for each state s and input symbol a. there is at most one edge labeled a leaving s. .  Since there is at most one transition from each state on any input. a transition on input ε.Lexical Analyzer: Finite Automata  Deterministic Finite Automata (DFA) :A mathematical model in which 1) no state has an ε-transition.

a) description Set of nfa states reachable from nfa state s on e-transitions alone Set of nfa states reachable from nfa state s in T on e-transitions alone Set of nfa states to which there is a transition on input symbol a from nfa state s in T .Lexical Analyzer: Finite Automata  Conversion of NFA to DFA:  Subset construction algorithm Input: NFA N Output: equivalent DFA D Method:  Operations used: operation Epsilonclosure(S) Epsilonclosure(T) Move(T.

Lexical Analyzer: Finite Automata  Conversion of NFA to DFA: Subset construction algorithm: Initially. a)). Dtrans[T. a]:=U End End . If U is not in D-states then Add U as an unmarked state to D-states. For each input symbol a do begin U:=e-closure(move(T.ε-closure(So) is the only state in D-states and it is unmarked While there is an unmarked state in T in D-states do begin Mark T.

.Lexical Analyzer: Finite Automata  From a regular expression to an NFA:  Thomson’s Construction  To convert regular expression r over an alphabet Σ into an NFA N accepting L(r)  Parse r into its constituent sub-expressions.  Construct NFAs for each of the basic symbols in r.

c C compiler a.out Input stream a.c Lex compiler Lex.yy.  This tool is referred as lex compiler.l Lex. Lex source program lex.Lexical Analyzer Generator  Lex is used to specify lexical analyzers for a variety of languages.yy.out Sequence of tokens Creating a lexical analyzer with Lex .

. describing what action is to be taken when pattern pi matches a lexeme. each pi is a RE and P2 {action2} each action i is a program fragment … …. manifest constants and regular definitions.  Translation rules are of the form: P1 {action1} Here.Lexical Analyzer Generator  Lex specifications:  A lex program consists of three parts: Declarations %% Translation rules %% Auxiliary procedures  Declaration section includes declarations of variables.

but DFA recognizes patterns faster than the NFA. .  This is usually implemented using a finite automaton  The lexical analyzer generator constructs a transition table for a finite automaton from the regular expression patterns in the lexical analyzer generator specification. The transition table for an NFA is considerably smaller than that for a DFA. the lexical analyzer should look for lexemes.Lexical Analyzer Generator  Design:  Given a set of specifications.  This can be implemented using an NFA or a DFA.  The lexical analyzer itself consists of a finite automaton simulator that uses this transition table to look for the regular expression patterns in the input buffer.

Lexical Analyzer Generator  Design: Lex Specification Lex Compiler a) Lex Compiler Transition table Lexeme Input buffer FA Simulator Transition table b) Schematic lexical analyzer Model of Lex Compiler .

A properly designed grammar imparts a structure to a programming language that is useful for the translation of source programs. we can automatically construct an efficient parser that determines if a source program is syntactically well-formed.  The syntax of programming language constructs can be described by context-free grammars or BNF(Backus-Naur Form) notation. . yet easy-to-understand.  Grammars offer significant advantages: Gives a precise. New constructs can be added to a language.Syntax Analysis  Every programming language has rules that prescribe the syntactic structure of well-formed programs. From certain classes of grammars. syntactic specification of a programming language.

Source progra m Lexical analyser token get next token Parser Pars rest of front end e tree intermediat e representati on Symbol table .Syntax Analysis  The parser obtains a string of tokens from the lexical analyzer. if any.  The parser should report syntax errors.  It then verifies that the string can be generated by the grammar for the source language.

Syntax Analysis  Three general types of parsers for grammars:  Universal parsing methods: Can parse any grammar Too inefficient to use in production compilers  Top-down methods: Build parse trees from the top(root) to the bottom(leaves)  Bottom-up methods: Start from the leaves and work up to the root. .

Context-free grammars  Consider a conditional statement defined by a rule such as:  If S1 and S2 are statements and E is an expression. then “if E then S1 else S2” is a statement stmt → if expr then stmt else stmt  A context-free grammar consists of terminals. a start symbol and productions . non-terminals.

Context-free grammars 1. 3. The productions of a grammar specify the manner in which terminals and non-terminals can be combined to form strings . They impose a hierarchical structure on the language. . 2. and the sets of strings it denotes is the language denoted by the grammar. 4. Non-terminals are syntactic variables that denote sets of strings that help define language generated by the grammar. Each production consists of a non-terminal followed by an arrow(==>) followed by a string of non-terminals and terminals. Terminals are the basic symbols from which strings are formed. In a grammar one non-terminal is distinguished as the start symbol. The word "token" is a synonym for "terminal" when we are talking about grammars for programming languages.

the terminal symbols are id. ) are the terminals.Context-free grammars  Example: The grammar with the foll. +.-. .(. Productions expr → expr op expr expr → (expr) expr → -expr expr → id op → + op → op → * op → / In this grammar.| * | / | Where E. -. /. expr is the start symbol Example: E ==>EAE | (E) | -E | id A==> + | .+.() The non terminal symbols are expr and op.A are the nonterminals while id. *.*.

. E ==> -E ==> -(E) ==> .Derivation and Parse trees • Consider the foll.(id) We call such a sequence of replacements a derivation of -(id) from E.. Grammar: • • • • E ==>E+E | E*E | (E) | -E | id E ==> -E is read as “E derives -E” We can take a single E and repeatedly apply productions in any order to obtain a sequence of replacements For eg.

the language generated by G.Derivation and Parse trees • Given a grammar G with start symbol S. the string w is called a sentence of G. • A string of terminals w is in L(G) if and only if S ==> + w. where α may contain non terminals. we can use ==> + relation to define L(G). . • If S ==> * α . then α is a sentential form of G.

the parse tree for -(id+id) • E -> -E E -> .(E) -> .Derivation and Parse trees • Parse trees: • It may be viewed as a graphical representation for a derivation.(id + E) ( -> . read from left to right.(id + id) E id E E + id ) E .. • For eg.(E + E) -> . • Each interior node of a parse tree is labeled by a nonterminal. • The leaves are labeled by nonterminals or terminals.

Derivation and Parse trees • Example: id+id*id E -> E+E -> id+E -> id+ E*E -> id + id*E -> id + id*id E E i d + E i d (a) E * E i d E -> E*E -> E+E*E -> id+ E*E -> id + id*E -> id + id*id E E E i d + * E i d (b) E i d .

. • Carefully writing the grammar can eliminate ambiguity.Ambiguity • A grammar that produces more than one parse tree for some sentence is said to be ambiguous • An ambiguous grammar is one that produces more than one leftmost or more than one rightmost derivation for the same sentence.

Elimination of Left Recursion • Definition: A grammar is left recursive if it has a non terminal A such that there is a derivation A→ Aα for some string α. • Top-Down parsing methods cannot handle left-recursive grammars. • A left-recursive pair of productions A → Aα| β could be replaced by the non-left-recursive productions. so a transformation that eliminates left recursion is needed. A → βA' A' → α A'| ε .

.... Method: Apply the algorithm to G.. for i := 1 to n do begin for j: = 1 to i . end eliminate the immediate left recursion among Ai productions end .| δk are all current Aj productions....An...Elimination of Left Recursion  Algorithm: Input: Grammar with no cycles or e-productions... .1 do begin replace each production of the form Ai ==> Aj γ by the productions Ai ==> δ1γ | δ2 γ.. 1... Output: An equivalent grammar with no left recursion... A2..... Note that the resulting non-left-recursive grammar may have e -productions. Arrange the nonterminals in some order A1.| δk γ where Aj ==> δ1 | δ2 |.... 2...........

Elimination of Left Recursion  No matter how many A-productions there are.  First. we group the A-productions as A → Aα1| Aα2| …| Aαm|β1| β2…| βm • Then. we replace A-productions by • A → β1 A’ | β2 A’ …| βm A’ • A’ → α1 A’ | α2 A’ | …| αmA’ | ε . we can eliminate immediate left recursion from them.

we obtain E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → (E) | id . Grammar: E → E+T| T T → T*F| F F → (E) | id • Eliminating the immediate left recursion to the productions for E and then for T.Elimination of Left Recursion  Example:  Consider the foll.

 The basic idea is that when it is not clear which of the two alternative productions to use to expand a nonterminal A. we do not know whether to expand A to αβ1 or to αβ2.  For example:  If A==>αβ1|αβ2 are two A-productions and the input begins with a non empty string derived from α . Then after seeing the input derived from α we expand A' to β1or to β2  A==>αA'  A'==>β1|β2 .Left Factoring  Left factoring is a grammar transformation that is useful producing a grammar suitable for predictive parsing.  We may defer decision by expanding A to αA' .

. For each non terminal A find the longest prefix α common to two or more of its alternatives... Grammar G Output... Method.. If α!= ε. Repeatedly apply this transformation until no two alternatives for a non-terminal have a common prefix.. .|αbn|g ... An equivalent left factored grammar. there is a non trivial common prefix... i..e...........Left Factoring  Algorithm: Input.|bn Here A' is new nonterminal.. where g represents all alternatives that do not begin with α by A==>αA'|g A'==>b1b|2|... replace all the A productions A==>αb1|αb2|..

Grammar S → iEtS | iEtSeS | a E→b Left-factored S → iEtSS’ | a S’ → eS | ε E→b .Left Factoring  Example: consider the foll.

Parsing methods  The syntax analysis phase of a compiler verifies that the sequence of tokens extracted by the scanner represents a valid sentence in the grammar of the programming language.  In top-down parsing. you start with the start symbol and apply the productions until you arrive at the desired string.  In bottom-up parsing. you start with the string and reduce it to the start symbol by applying the productions backwards.  There are two major parsing approaches: top-down and bottom-up. .

Parsing methods  Consider the foll grammar: S –> AB A –> aA | ε B –> b | bB  Here is a top-down parse of aaab. . S AB S –> AB aAB A –> aA aaAB A –> aA aaaAB A –> aA aaaεB A –> ε aaab B –> b  The top-down parse produces a leftmost derivation of the sentence.

aaab aaaεb aaaAb aaAb aAb Ab AB S (insert ε) A –> ε A –> aA A –> aA A –> aA B –> b S –> AB .Parsing methods  Consider the foll grammar: S –> AB A –> aA | ε B –> b | bB  A bottom-up parse works in reverse  The bottom-up parse prints out a rightmost derivation of the sentence.

 As we parse the input string.  A procedure is associated with each nonterminal of a grammar. we call the procedures that correspond to the left-side nonterminal of the productions.Top-Down Parsing  It is an attempt to find the leftmost derivation for an input string  Recursive-descent parsing:  In this we execute a set of recursive procedures to process the input.  Consider the foll. Grammar: S → cAd A → ab|a  Input string w = cad .

Top-Down Parsing  Construct parse tree: S c A Fig. report failure and go back to A and reset the pointer to position 2.  Expand A with other alternative and check. the i/p pointer points to c. Since we have produced a parse tree for w. Fig.a  The leftmost leaf matches with the first symbol. a d c a S A b d c S A d a Fig. so advance the i/p pointer to a.b then uses the first production for S to c expand the tree as shown in Fig. we halt and announce successful completion of parsing . since there is no match.  Initially. then use the second leaf and expand A using the first alternative as shown in Fig. b  Since there is a match advance the pointer to third symbol d.

first eliminate left recursion and then left factor the grammar.Top-Down Parsing  Predictive parser:  It is recursive descent parser that needs no backtracking.  Transition diagrams for predictive parsers:  To construct the transition diagram. create a path from the initial to the final state. for each production A -> X1X2…Xn. and makes a potentially recursive procedure call whenever it has to follow an edge labeled by a nonterminal. . create an initial and final state 2.  A predictive parser based on transition diagrams attempt to match terminal symbols against the input. Then for each nonterminal A do the foll: 1. with edges labeled X1X2…Xn .

rather than implicitly via recursive calls. Input a + b $ Stack X Y Z $ Predictive Parsing Program Output Parsing Table M Model of a nonrecursive predictive parser .Top-Down Parsing  Nonrecursive Predictive Parsing:  It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly .

Top-Down Parsing: Nonrecursive Predictive Parsing  Input buffer: it contains the string to be parsed.  The parser is controlled by a program as follows . followed by $. Initially. it contains the start symbol of the grammar on top of $.  Parsing table: It is a 2-D array M[A. $. symbol used to indicate the end of the input string  Stack: It contains a sequence of grammar symbols with $ on the bottom. a is a terminal or symbol $.a] where A is a nonterminal.

M[X.Top-Down Parsing: Nonrecursive Predictive Parsing  The program considers X. a]=error. a] of the parsing table M. the current i/p symbol. If M[X. . There are 3 possibilities: If X= a=$. the parser halts and announces successful completion If X=a!=$. for example. If. the symbol on top of the stack and a . the parser calls an error recovery routine. a]={X->UVW}.  These 2 symbols determine the actions. the parser pops X off the stack and advances the input pointer to the next input symbol. This entry will be either an X-production of the grammar or an error entry. the parser replaces X on top of the stack by WVU( with U on top). the program consults entry M[X. If X is a nonterminal.

if X is a terminal or $ then if X=a then pop X from the stack and advance ip else error() else if M[X.  Output. A string w and a parsing table M for grammar G. output the production X-> Y1Y2..Yk then begin pop X from the stack. If w is in L(G)..Y1 onto the stack. push Yk. a leftmost derivation of w.Top-Down Parsing: Nonrecursive Predictive Parsing  Input...Yk-1. with Y1 on top.Yk end else error() until X=$ .  Method. Initially. an error indication.. otherwise. and w$ in the input buffer.. $S on the stack with S on top. repeat let X be the top stack symbol and a the symbol pointed to by ip.a]=X->Y1Y2. set ip to point to the first symbol of w$.

Top-Down Parsing: Nonrecursive Predictive Parsing Nonterminal id E E’ T T’ F F →id T → FT’ T’ → ε T’ → *FT’ F →(E) E →TE’ E’ →+TE’ T → FT’ T’ → ε T’ → ε + Input symbol * ( E →TE’ E’ → ε E’ → ε ) $ Parsing table M .

then $ is in FOLLOW(A). . for nonterminals A. If α =*>e. let FIRST(α) be the set of terminals that begin the strings derived from α. If A can be the rightmost symbol in some sentential form. the set of terminals a such that there exists a derivation of the form S=*> αAaβ for some α and β.  FOLLOW(A). that is.Top-Down Parsing: Nonrecursive Predictive Parsing  The construction of a predictive parser is aided by two functions associated with a grammar G: FIRST and FOLLOW. to be the set of terminals a that can appear immediately to the right of A in some sentential form. then e is also in FIRST(α).  If α is any string of grammar symbols.

.. 2. then we add nothing more to FIRST(X). then we add FIRST(Y2) and so on.. .. 1. If X->e is a production. If y1 does not derive e. then add e to FIRST(X). 3.Yk is a production.FIRST(Yi-1)... that is. For example. apply the following rules until no more terminals or e can be added to any FIRST set.Top-Down Parsing: Nonrecursive Predictive Parsing  To compute FIRST(X) for all grammar symbols X.k.2.. then FIRST(X) is {X}. If X is nonterminal and X->Y1Y2.. then add e to FIRST(X). If e is in FIRST(Yj) for all j=1. If X is terminal.. everything in FIRST(Yj) is surely in FIRST(X). but if Y1=*>e. a is in FIRST(Yi) and e is in all of FIRST(Y1).Yi-1=*>e. then place a in FIRST(X) if for some i.... Y1.

If there is a production A=>aBß where FIRST(ß) except e is placed in FOLLOW(B). 3. 1. 2. . Place $ in FOLLOW(S). where S is the start symbol and $ in the input right endmarker. apply the following rules until nothing can be added to any FOLLOW set. then everything in FOLLOW(A) is in FOLLOW(B).Top-Down Parsing: Nonrecursive Predictive Parsing  To compute the FOLLOW(A) for all nonterminals A. If there is a production A->aB or a production A->aBß where FIRST(ß) contains e.

For each production A → α of the grammar. Make each undefined entry of M be error. 4. add A → α to M[A. For each terminal a in FIRST(α). If ε is in FIRST(α) and $ is in FOLLOW(A). .Top-Down Parsing: Nonrecursive Predictive Parsing Algorithm: Construction of predictive parsing table Input: Grammar G Output: Parsing table M Method: 1. a]. add A → α to M[A. If ε is in FIRST(α). 3. 2. b] for each nonterminal b in FOLLOW(A). add A → α to M[A.$]. do steps 2 and 3.

 Properties:  No ambiguous or left-recursive grammar can be LL(1)  A grammar is LL(1) iff whenever A → α| β are two distinct productions of G and the foll. conditions hold:  For no terminal a do both α and β derive strings beginning with a  At most one of α and β can derive the empty string  If β =*>e.  The first “L” stands for scanning the input from left to right.  The second “L” for producing a leftmost derivation  1 for using one input symbol of lookahead at each step to make parsing action decisions. . then α does not derive any string beginning with a terminal in FOLLOW(A).Top-Down Parsing: LL(1) grammars  A grammar whose parsing table has no multiple-defined entries is said to be LL(1).

 to alleviate some of this difficulty. . a common organization for a parser in a compiler is to use a predictive parser for control constructs and to use operator precedence for expressions. they make the resulting grammar hard to read and difficult to use for translation purposes.Top-Down Parsing: LL(1) grammars  Disadvantages:  The main difficulty in using predictive parsing is in writing a grammar for the source language  Although left recursion elimination and left factoring are easy to do.

.  This process is of reducing a string to the start symbol of the grammar.  Consider the grammar:  S → aABe  A → Abc | b B → d  String w= abbcde abbcde aAbcde aAde aABe S  These reductions trace out the rightmost derivation in reverse.  At each reduction step a particular substring matching the right side of a production is replaced by the symbol of the left of that production.Bottom-Up Parsing:  It attempts to construct a parse tree for an i/p string beginning at the leaves and working towards the root.

Bottom-Up Parsing: Shift-Reduce parser  Consider the foll. Grammar:  E ==>E+E | E*E | (E) | id  Input string: id1+id2*id3 Right-sentential form id1+id2*id3 E+id2*id3 E+E*id3 E+E*E E+E E Handle id1 id2 id3 E*E E+E Reducing production E →id E →id E →id E →E*E E →E+E Reductions made by shift-reduce parser .

 The parser then reduces β to the left side of the production. Stack Input $S $ A handle of a string is a substring that matches the right side of a production. and whose reduction to the nonterminal on the left side of the production represents one step of reduction process.  The parser repeats until the stack contains the start symbol and the input is empty.Bottom-Up Parsing:  Stack implementation of shift-reduce parsing:  A stack is used to hold grammar symbols  Input buffer to hold the string w to be parsed. the stack is empty and the string w is the input as follows: Stack Input $ w$  The parser operates by shifting zero or more input symbols onto the stack until a handle β is on top of the stack.  Initially. .

It must then locate the left end of the handle within the stack and decide with what nonterminal to replace the handle.  Reduce: the parser knows the right end of the handle is at the top of the stack.  Accept: the parser announces successful completion of parsing  Error: the parser discovers that a syntax error has occurred and calls an error recovery routine.Bottom-Up Parsing:  Stack implementation of shift-reduce parsing:  There are four possible actions a shift-reduce parser can make:  Shift: the next symbol is shifted onto the top of the stack. .

Bottom-Up Parsing: Stack $ $id1 $E $E+ $E+id2 $E+E $E+E* $E+E*id3 $E+E*E $E+E $E Input id1+id2*id3$ +id2*id3$ +id2*id3$ id2*id3$ *id3$ *id3$ id3$ $ $ $ $ Action Shift Reduce by E →id Shift Shift Reduce by E →id shift Shift Reduce by E →id Reduce by E →E*E Reduce by E →E+E Accept .

 Consider the foll. grammar for expressions: E → EAE | (E) | -E |id A→+|-|*|/|↑  Is not an operator grammar. we obtain an operator grammar. E → E+E | E-E | E*E | E/E | E ↑ E | (E) | -E |id .Bottom-Up Parsing: Operator-Precedence parsing  Operator grammar:  These grammars have the property that no production right side is ε or has two adjacent nonterminals.  If we substitute for A each of its alternatives.

> . Relation Meaning a<. <.>b id id + * $ <.<.> <. + . =.> <. a “yields precedence to” b a “has the same precedence as” b a “takes precedence over” b * .> . between certain pairs of terminals.> .b a = .b a. ..> .>.> . $ . <.> <.Bottom-Up Parsing: Operator-Precedence parsing  We define three disjoint precedence relations. and .

id .> * <.> $ $E+E*E$ $+*$ $<. id .+<.>$ $E+E$ $+$ $<.> + <. id .> * <. id . Then scan backwards until a <. id .> $  The handle can then be found by the foll. $<. The handle contains everything to the left of the first . id .> and to the right of <.>$ E . is encountered.+.Bottom-Up Parsing: Operator-Precedence parsing  Consider the string : id+id*id $<.> + <. process: Scan the string from left until first .*.> is encountered.

>b then  repeat Pop the stack  Until the top stack terminal is related by <.Bottom-Up Parsing: Operator-Precedence parsing  Algorithm:  Set ip to point to the first symbol of w$  Repeat forever  If $ is on top of the stack and ip points to $ then return  Else begin Let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by ip  If a<. to the terminal most recently popped  Else error() end .b or a=b then begin Push b onto the stack Advance ip to the next input symbol  end  else if a.

.> θ1 if the operators are left associative or make θ1 <.>. +. if + and – are left associative.( $<.Bottom-Up Parsing: Operator-Precedence parsing  If operator θ1 has higher precedence than operator θ2.> θ2 and θ2 <. make θ1 . θ. eg. *  If θ1 and θ2 are operators of equal precedence. id. θ for all operators θ. then make θ1 .) .> ) )>. (<..id id. then make + .id (<.> $ and $<.> +  Make θ<. θ. θ<. -. θ2 and θ2 <. Also let (=) $<.>θ.>+ and +<.(. make *.>).> θ. θ1 if the operators are right associative.> θ2 and θ2 .and -.id.>$ (<. if * has higher precedence than +.>$ ).( id. θ.> +. ). θ1 eg.> -.

<.> . .> .> .> .> . <. ) .> $ .> .> .> .> . .> .> .> <.> id <.> * <.> .> <.> . <. <. <.Bottom-Up Parsing: Operator-Precedence parsing  Consider the foll grammar: E → E+E | E-E | E*E | E/E | E ↑ E | (E) | -E |id  Assume:  ↑ is of highest precedence and right associative  * and / are of next highest precedence and left-associative and  + and – are of lowest precedence and left-associative + + * / ↑ id ( ) .> . . <.> = . <. <.> .> . .> .> . ( <.> .> <.> <. <.> .> .> .> .> .> / <. <. .> . .> . <. <.> . <.> . . <. . <.> <.> ↑ <. <.> .

.  LR(k) parsing: L is for left-to-right scanning of the input.  Characteristics:  LR parsers can be constructed to recognize all programming language constructs  It is general nonbacktracking shift-reduce parsing method  It can detect a syntactic error as soon as it is possible  Drawback:  Lot of work to construct LR parser . hence requires a specialized tool- LR parser generator. R for constructing a rightmost derivation in reverse and k for the number of lookahead input symbols.Bottom-Up Parsing: LR Parsers  It is used to parse a large class of context-free grammars.

 Canonical LR: most powerful and the most expensive. . will work on most programming language grammars.  Simple LR(SLR): is the easiest to implement but the least powerful.Bottom-Up Parsing: LR Parsers  There are 3 techniques for constructing an LR parsing table for a grammar.  LookAhead LR (LALR): is intermediate in power .

. where sm is on top. a stack.Bottom-Up Parsing: LR Parsers  LR parsing algorithm:  It consists of an input.  The program uses a stack to store a string of the form s0X1s1x2…. a driver program and a parsing table that has two parts (action and goto).  Each Xi is a grammar symbol and si is a symbol called a state.  The parsing program reads characters from an input buffer one at a time.  Each state symbol summarizes the information contained in the stack below it and the combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table and determine the shift-reduce parsing decision.. an output.

ai]. the parsing action table entry for state sm and input ai. and Error The function goto takes a state and grammar symbol and produces a state.Bottom-Up Parsing: LR Parsers  The parsing table consists of 2 parts: a parsing function action and a goto function goto. the state currently on top of the stack and ai.  The program behaves as follows:  It determines sm. which can have one of four values: Shift s. the current input symbol .  It consults action[sm. where s is a state Reduce by a grammar production A → β Accept. .

The current i/p symbol is not changed in a reduce move. the entry for goto [sm-r. If action[sm.ai] = accept. If action[sm.A]. . Here the parser has shifted both the current input symbol ai and the next state s. ai+1 becomes the current input symbol. If action[sm.Bottom-Up Parsing: LR Parsers  The configurations resulting after each of the four types of move are as follows: If action[sm.ai] onto the stack. the parser has discovered an error and calls an error recovery routine.ai] =shift s.ai]=reduce A → β. then the parser executes a reduce move Here the parser pops 2r symbols off the stack (r is the length of β). onto the stack. the parser executes a shift move. The parser then pushes both A and s. parsing is completed.ai]=error. which is given in action[sm.

the parser has s0 (initial state) on its stack and w$ in the input buffer.  Example:  (1) E → E+T  (2) E →T  (3) T → T*F  (4) T →F  (5) F → (E)  (6) F →id . otherwise an error indication  Method: initially. a bottom-up parse for w.Bottom-Up Parsing: LR Parsing algorithm  Input: input string w and an LR parsing table with functions action and goto for G  Output: if w is in L(G).

Bottom-Up Parsing: LR Parsers State id 0 1 2 3 4 5 6 7 8 9 S5 S5 S6 R1 S7 S5 r6 r6 S4 S4 S11 R1 R1 S5 S6 R2 r4 S7 r4 S4 r6 r6 9 3 10 R2 r4 + * action ( S4 Acc R2 r4 8 2 3 goto ) $ E 1 T 2 F 3 1. 113. 4. 2. 10 sj means shift and stack state i R3 R3 R3 R3 rj means reduce by production numbered j R5 R5 R5 R5 acc means accept blank means error Parsing table .

Bottom-Up Parsing: LR Parsers STACK (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) 0 0 id 5 0F3 0T2 0 T 2 *7 0 T 2 *7 id 5 0 T 2 *7 F 10 0T2 0E1 0E1+6 0 E 1 + 6 id 5 0E1+6F3 0E1+6T9 0E1 INPUT Id * Id + id $ * Id + id $ * Id + id $ * Id + id $ Id + id $ + id $ + id $ + id $ + id $ id $ $ $ $ $ ACTION Shift Reduce by F →id Reduce by T →F Shift Shift Reduce by F →id Reduce by T →T*F Reduce by E →T Shift Shift Reduce by F →id Reduce by T →F E →E+T accept Moves of LR parser on .

Code Optimization  It aims at improving the execution efficiency of a program. Source Progra m Front End Optimizati on Phase Back end Target Progra m Intermediate representatio n (IR) .  This is achieved in two ways: Redundancies in a program are eliminated Computations in a program are rearranged or rewritten to make it execute efficiently.

. an assignment a=3.  Known as constant folding  eg.Code Optimization techniques  Compile time evaluation:  Performing certain actions specified during compilation itself .. thereby reducing the execution time of the program.57 thereby eliminating division operation.  When all operands in an operation are constants.14/2 can be replaced by a=1. the operation can be performed at compilation time.

Code Optimization techniques  Elimination of common subexpressions:  Common subexpressions are occurrences of expressions yielding the same value.2 .2 t =b+c a=t … x= t+5.  We can avoid recomputing the expression if we can use the previously computed value.  Example: a =b+c … x= b*c+5.

y = x+z. x= 25*a. end x= 25*a. for i=1 to 100 do begin z=i. y = x+z. end . for i=1 to 100 do begin z=i.Code Optimization techniques  Dead Code Elimination:  Code which can be omitted from a program without affecting its results is called dead code.  Frequency Reduction:  Execution time of a program can be reduced by moving code from a part of a program which is executed very frequently to another part of the program which is executed fewer times.  Dead code is detected by checking whether the value assigned in an assignment statement is used anywhere in the program.

.Code Optimization techniques  Strength Reduction:  Replaces the occurrence of a time consuming operation by an occurrence of a faster operation eg. replacement of a multiplication by an addition. end itemp=5. --itemp = itemp+5. for i=1 to 10 do begin --k= i*5. ---. end . for i=1 to 10 do begin --k = itemp.

. .  Global optimization : optimizing transformations are applied over a program unit i. over a function or a procedure.Code Optimization techniques  Local and global optimization:  Local optimization : optimizing transformations are applied over small segments of a program consisting of a few statements.e.

Yacc specification translate.tab.c C compiler a.  Used to generate LALR parsers using the YACC parser generator provided on Unix.It stands for “yet another compiler-compiler”.YACC  It is a parser generator .y y.out Input a.c Yacc compiler y.out output .tab.

The statement %token DIGIT .YACC  A YACC program has 3 parts:  declarations  %%  translation rules  %%  supporting C functions  Declarations part:  there are two optional sections  In the first section. we put ordinary C declarations delimited by % { and %}.  It also contain the declarations of grammar tokens  Eg.

{semantic action 1} {semantic action 2} {semantic action n}  In a YACC production. ..  <left side> : <alt 1> | <alt 2> … | <alt n> . a quoted single character is taken to be the terminal symbol and unquoted strings of letters and digits not declared to be tokens are taken to be nonterminals. For eg.YACC  The translation rules part:  Enclosed between %% and %%  Each rule consists of a grammar production and the associated semantic action.

YACC  A YACC semantic action is a sequence of C statements.  eg.} | term . $$ refers to the attribute value associated with the nonterminal on the left.  The semantic action is performed whenever we reduce by the associated production..  Supporting C-routines part:  a lexical analyzer by the name yylex() must be provided. $i refers to the value associated with the ith grammar symbol on the right. For 2 productions: E → E+T| T expr : expr ‘+’ term {$$ = $1 + $3.  Error recovery routines may be added .

with both syntax directed definitions and translation schemes. build the parse tree. and then traverse the tree as needed to evaluate the semantic rules at the parse tree nodes . we pass the input token stream.Syntax Directed Translation  There are 2 notations for associating semantic rules with productions: Syntax-directed definitions and translation schemes input parse dependency evaluation order string tree graph for semantic rules  Conceptually.

 The value of an attribute at a parse tree node is defined by a semantic rule associated with a production used at that node.  The value of a synthesized attribute at a node is computed from the values of attributes at the children of that node.  The value of a inherited attribute at a node is computed from the values of attributes at the siblings and parent of that node . partitioned into 2 subsets called the synthesized and inherited attributes of that grammar symbol.Syntax Directed Definition  A syntax-directed definition is a generalization of a CFG in which each grammar symbol has an associated set of attributes.

 A parse tree showing the values of attributes at each node is called an annotated parse tree.  The process of computing the attribute values at the nodes is called annotating or decorating the parse tree. . we derive an evaluation order for the semantic rules.  Evaluation of the semantic rules defines the values of the attributes at the nodes in parse tree for the input string.Syntax Directed Definition  Semantic rules set up dependencies between attributes that will be represented by a graph.  From the dependency graph.

An attribute grammar is a syntax directed definition in which the functions in semantic rules cannot have side effects.. and either  b is a synthesized attribute of A and c1. we say that the attribute b depends on attributes c1.. c2.ck.ck are attributes belonging to the grammar symbols of the production. ……..  b is an inherited attribute of one of the grammar symbols on the right side of the production. c2. or.. each grammar production A→α has associated with it a set of semantic rules of the form b:= f(c1. c2.  In either case.. .ck are attributes belonging to the grammar symbols of the production.ck). c2..……..Syntax Directed Definition  Form of a Syntax Directed Definition  In a syntax directed definition. and c1.…….…….. where f is a function.

Syntax Directed Definition  A syntax directed definition that uses synthesized attributes exclusively is said to be an S-attributed definition.val + F.val = E1.val = E. lexval .val F.val T.val T.val F.val = F.val + T.  Example: production L →En E → E1 +T E→T T → T1 * F T→F F → (E) F → digit Semantic rules Print(E.val E.val = T.val = T1.val) E.val = digit.

val=4 F.val=15 T. from the leaves to + the root.val=19 n E.val=5 A parse tree is annotated by evaluating the semantic rules for the attributes at each node bottom up.val =4 digit.lexval=3 * F. ANOTATED PARSE TREE FOR 3*5+4n L E.Syntax Directed Definition  Synthesized attributes: A syntax directed definition that uses synthesized attributes exclusively is said to be an S-attributed definition.val=15 T.lexval=4 digit. T.lexval=5 .val=3 F.val=3 digit.

type = integer T.  For eg.Syntax Directed Definition  Inherited Attributes:  They are useful for expressing the dependence of a programming language construct on the context in which it appears.in = L. production D →TL T → int T → real L → L1 .in addtype(id.entry. id L → id Semantic rules L.in) addtype(id.L.Type = real Ll..in) .L.entry. to keep track of whether an identifier appears on the left or right side of an assignment in order to decide whether the address or value of the identifier is needed.in = T.type T.

Syntax Directed Definition  Inherited Attributes:  Parse tree for the sentence: real id1.in =real .in =real .in =real L.id3 D T.id2. id1 id2 id3 .type = real real L. L.

 Example: 3*5+4 + * 3 5 4 .  In the syntax tree.Syntax trees  An (abstract) syntax tree is a condensed form of a parse tree useful for representing language constructs. operands and keywords do not appear as leaves. but rather are associated with the interior node that would be the parent of those leaves in the parse tree.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.