Compiler Unit-2 Partha Sir

Unit - II
Chapter 4
Syntax Analysis
Top – Down Parsing
Partha Sarathi Chakraborty

Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2
Outline
• Role of the parser
• Top-Down parsing:
–Predictive Parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty
• Recursive, and
• Nonrecursive
3
Introduction
• The syntax of the programming language constructs can be
described by context free grammars or BNF (Backnus-Naur Form).
• Grammar offers significant advantage to both language designer
and compiler writers.
• A grammar gives precise, yet easy-to understand, syntactic
specification of a programming language.
• From certain class of grammars we can automatically construct an

efficient parser that determines if a source program is syntactically
well formed.
• A properly designed grammar imparts a structure to a programming
language that is useful for the translation of source program into
correct object code and for the detection of errors.
• Languages evolve over a period of time, acquiring new constructs
and performing additional tasks.
4
Backus-Naur Form (BNF)

• Backus-Naur form (BNF) is a formal notation for
encoding grammars intended for human consumption.
• Many programming languages, protocols or formats
have a BNF description in their specification.
• Every rule in Backus-Naur form has the following

structure:
name ::= expansion
• The symbol ‘::=’ means "may expand into" and "may
be replaced with."
• a name is also called a non-terminal symbol.
5

• Every name in Backus-Naur form is surrounded by
angle brackets, < >, whether it appears on the left- or
right-hand side of the rule.
• An expansion is an expression containing terminal
symbols and non-terminal symbols, joined together

by sequencing and choice.
• A terminal symbol is a literal like ("+" or "function")
or a class of literals (like integer).
• Simply juxtaposing expressions indicates sequencing.
• A vertical bar ‘|’ indicates choice.
6

• For example, in BNF, the classic expression
grammar is:
<expr> ::= <term> "+" <expr>
| <term>
<term> ::= <factor> "*" <term>

| <factor>
<factor> ::= "(" <expr> ")“
| <const>
<const> ::= integer
7

• Naturally, we can define a grammar for rules
in BNF:
rule → name ::= expansion
name → < identifier >
expansion → expansion expansion

expansion → expansion | expansion
expansion → name
expansion → terminal
8
Position of a Parser in the Compiler Model
Token,
Source tokenval Parser
Lexical Intermediate
Program and rest of
Analyzer representation
Get next front-end

token
Lexical error Syntax error
Semantic error
Symbol Table
9
The Parser
• The task of the parser is to check syntax
• The syntax-directed translation stage in the
compiler’s front-end checks static semantics and
produces an intermediate representation (IR) of
the source program

– Abstract syntax trees (ASTs)
– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer lists
– WHIRL (SGI Pro64 compiler) has 5 IR levels!
10
Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover and
continue
• Example: misspelling identifier, keyword or operator
– Syntax errors: most important for compiler, can almost
always recover
• Example: an arithmetic expression with unbalanced parenthesis
– Static semantic errors: important, can sometimes recover
– Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
• Example for semantic error: an operator applied to an incompatible
operand.
– Logical errors: hard or impossible to detect
• Example: an infinitely recursive call.
11
Error Handling
• The error handler in a parser has simple-to-
state goals:
– It should report the presence of errors clearly and
accurately.
– It should recover from each error quickly enough
to be able to detect subsequent errors.
– It should not significantly slow down the
processing of correct programs.
12
Viable-Prefix Property
• The viable-prefix property of LL/LR parsers
allows early detection of syntax errors
– Goal: detection of an error as soon as possible
without consuming unnecessary input

– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in the
language
Error is
Error is detected here
… detected here …
Prefix Prefix DO 10 I = 1;0
for (;)
… …
13
Error Recovery Strategies

• Panic mode
– Discard input until a token in a set of designated
synchronizing tokens is found
• Phrase-level recovery
– Perform local correction on the input to repair the error

• Error productions
– Augment grammar with productions for erroneous
constructs
• Global correction
– Choose a minimal sequence of changes to obtain a
global least-cost correction
14
Grammars
• Context-free grammar is a 4-tuple
G=(N,T,P,S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
– P is a finite set of productions of the form


where   (NT)* N (NT)*
and   (NT)*
– S is a designated start symbol S  N
15
Notational Conventions Used

• Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +
• Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt

• Grammar symbols
X,Y,Z  (NT)
• Strings of terminals
u,v,w,x,y,z  T*
• Strings of grammar symbols
,,  (NT)*
16
Derivations
• The one-step derivation is defined by
A
where A   is a production in the grammar
• In addition, we define
–  is leftmost lm if  does not contain a nonterminal

–  is rightmost rm if  does not contain a nonterminal
– Transitive closure * (zero or more steps)
– Positive closure + (one or more steps)
• The language generated by G is defined by
L(G) = {w | S + w}
17
Derivation (Example)
EE+E
EE*E
E(E)
E-E
E  id
E  - E  - id
E rm E + E rm E + id rm id + id
E * E
E + id * id + id
18
Chomsky Hierarchy: Language Classification

• A grammar G is said to be
– Regular if it is right linear where each production is of
the form
AwB or Aw
or left linear where each production is of the form
ABw or Aw
– Context free if each production is of the form
A
where A  N and   (NT)*
– Context sensitive if each production is of the form
A
where A  N, ,,  (NT)*, || > 0
– Unrestricted
19
Chomsky Hierarchy
L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Where L(T) = { L(G) | G is of type T }

That is, the set of all languages
generated by grammars G of type T
Examples:
Every finite language is regular
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive
20
Ambiguity
• A grammar that produces more than one
parse tree for some sentence is said to be
ambiguous.
• Example: id + id * id
• Two distinct left derivation

EE+E EE*E
 id + E E+E*E
 id + E * E  id + E * E
 id + id * E  id + id * E
 id + id * id  id + id * id
Ambiguity: Two Parse trees

21
22
Eliminating Ambiguity
• “Dangling-else” grammar
• Grammar is ambiguous since the string

if E1then if E2then S1else S2
has two parse trees

23
Eliminating Ambiguity
• The general rule is, “Match each else with the closet previous
unmatched then”.
• The idea is that a statement appearing between a then and an
else must be "matched" ; that is, the interior statement must not
end with an unmatched or open then. A matched statement is
either an if-then-else statement containing no open statements
or it is any other kind of unconditional statement.

• Now, the grammar, rewritten
24
Left Recursion
• Productions of the form
AA|
are left recursive
• Top-down parsing methods cannot handle

left-recursive grammars, so a transformation
that eliminates left recursion is needed.
25
Immediate Left-Recursion
Elimination
Rewrite every left-recursive production

AA|
into a right-recursive production:

A   A’
A’   A’ | 
26
Example
• Consider the grammar
EE+T|T
TT*F|F
F  ( E ) | id
• Eliminate immediate left recursion (Non-terminal E and T

having such productions A  A )
E  TE’
E’ +TE’ | 
T  FT’
T’ *FT’ | 
F  ( E ) | id
27
Another Example
• Consider the grammar, but it is not immediately left recursive.
S  Aa | b
A  Ac | Sd | 
• Using general left recursion algorithm
• Substitute S-productions A  Sd to obtain the following
productions
A  Ac | Aad | bd | 
• Now, Eliminate the immediate left recursion among the A-
productions
S  Aa | b
A  bdA’ | A’
A’  cA’ | adA’ | 
28
General Left Recursion

Elimination
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai  Aj 
with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
29
Example Left Rec. Elimination

ABC|a
BCA|Ab Choose arrangement: A, B, C
CAB|CC|a
i = 1: nothing to do
i = 2, j = 1: BCA|Ab
 BCA|BCb|ab
(imm) B  C A BR | a b BR
BR  C b BR | 
i = 3, j = 1: CAB|CC|a
 CBCB|aB|CC|a
i = 3, j = 2: CBCB|aB|CC|a
 C  C A BR C B | a b BR C B | a B | C C | a
(imm) C  a b BR C B CR | a B CR | a CR
CR  A BR C B CR | C CR | 
30
Left Factoring
• When a nonterminal has two or more productions whose right-
hand sides start with the same grammar symbols, the grammar
is not LL(1) and cannot be used for predictive parsing
• If A   1 |  2 |  are productions
• After Left-Factored,
A   A’ | 
A’  1 | 2
• In general, Replace productions

A   1 |  2 | … |  n | 
with
A   A’ | 
A’  1 | 2 | … | n
31
Example
S  iEtS | iEtSeS | a
Eb
• Left factored, this grammar becomes:

S  iEtSS’ | a
S’ eS | 
Eb
32
Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
• Top-down (C-F grammar with restrictions)

– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (C-F grammar with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
33
Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing
Grammar: Leftmost derivation:
ET+T E lm T + T
T(E) lm id + T
T-E
lm id + id
T  id
E E E E
T T T T T T
+ id + id + id
34
Recursive Descent Parsing

• Grammar must be LL(1)
• Every nonterminal has one (recursive) procedure responsible
for parsing the nonterminal’s syntactic category of input tokens
• When a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based on

input look-ahead information.
• It may involve backtracking i.e. making repeated scans of the
input.
• It is implemented as a mutual recursive suite of functions
that descend through a parse tree for the string, and as
such are called “recursive descent parsers”.
35
Example
S  cAd
A ab | a
• Steps to build parse tree for string “cad”.
• Note: A left-recursive grammar can cause a recursive-decent parser, even

one with backtracking, to go into an infinite loop i.e. try to expand A, it may
eventually find ourselves again trying to expand A without having
consumed any input.
36
It is possible for recursive-decent

parser to loop forever
left-recursive production right-recursive production:

AA| AR
RR|
37
Advantage and Limitations of

recursive-descent parser
• Advantage:
– It is simple to build.
– It can be constructed with the help of parse tree.
• Limitations:
– It is not very efficient as compared to other parsing
techniques as there are chances that it may enter in an
infinite loop for some input.
– It is difficult to parse the string if lookahead symbol is
arbitrarily long.
38
Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
• Compute FIRST and FOLLOW

• Two variants:
– Recursive (recursive calls)
– Non-recursive (table-driven)
39
Transition Diagrams for Predictive Parsers

Consider the
grammar:
E  TE’
E’ +TE’ | 
T  FT’
T’ *FT’ | 
F  ( E ) | id
Parsers
Transition Diagrams for Predictive
40
Predictive Parsers
Transition Diagrams for
41
42
Non-Recursive Predictive
Parsing
• Given an LL(1) grammar G=(N,T,P,S)
construct a table M[A,a] for A  N, a  T
and use a driver program with a stack
input a + b $
stack
Predictive parsing
X program (driver) output
Y
Z Parsing table
$ M
43
FIRST and FOLLOW

• FIRST: If  is any string of grammar symbols, let
FIRST() be the set of terminals that begin the strings
derived from . If  * , then  is also in
FIRST().
• FOLLOW: it is defined as for nonterminal A i. e.

FOLLOW(A), to be the set of terminals a that can
appear immediately to the right of A in some
sentential form, that is, the set of terminals a such that
there exists a derivation of the form S * Aa for
some  and . If A is start symbol, then $ is in
FOLLOW(A).
44
FIRST
• To compute FIRST(X) for all grammar symbols X, apply
the following rules until no more terminals or  can be
added to any FIRST set.
– If X is terminal, then FIRST(X) is {X}.
– If X   is a production, then add  to FIRST(X).
– If X is nonterminal and XY1Y2Yk is a

production, then place a in FIRST(X) if for some i, a
is in FIRST(Yi), and  is in all of FIRST(Y1), ,
FIRST(Yi–1); that is, Y1  Yi-1 * . If  is in
FIRST(Yj) for all j=1, 2, , k, then add  to
FIRST(X).
45
Example
Consider the grammar
E  TE’
E’ +TE’ | 
T  FT’
T’ *FT’ | 
F  ( E ) | id
After applying FIRST rules over the grammar

FIRST(E) = FIRST(T) = FIRST(F) = { ( , id }
FIRST(E’) = { + ,  }
FIRST(T’) = { * ,  }
46
FOLLOW
• To compute FOLLOW(A) for all nonterminals A, apply
the following rules until nothing can be added to any
FOLLOW set.
– Place $ in FOLLOW(S), where S is the start symbol

and $ is the input right endmarker.
– If there is a production AB, then everything in
FIRST() except for  is placed in FOLLOW(B).
– If there is a production AB, or a production
AB where FIRST() contains  (i.e.  * ),
then everything in FOLLOW(A) is in FOLLOW(B).
47
Example
E  TE’
E’ +TE’ | 
T  FT’
T’ *FT’ | 
F  ( E ) | id
After applying FIRST rules over the grammar

FOLLOW(E) = FOLLOW(E’) = { ) , $ }
FOLLOW(T) = FOLLOW(T’) = { + , ) , $ }
FOLLOW(F) = { + , * , ) , $ }
48
Usefulness of FIRST and FOLLOW

• FIRST and FOLLOW, both functions help for the
construction of predictive parser, by fill in the entries
of a predictive parsing table for grammar G,
whenever possible.
• Sets of tokens yield by the FOLLOW function can
also be used as synchronizing tokens during panic-
mode error recovery.
• FIRST and FOLLOW also useful for LR parsing i. e.
for LR(1) items and SLR(1) table.
49
Another Example of FIRST and FOLLOW

• Grammar G
S  ACB | CbA | Ba
A  da | BC
Bg|
Ch|
Predictive Parsing Table

50
51
Construction of Predictive
Parsing Table
Algorithm: Construction of a predictive parsing table.
Input: Grammar G.
Output: Parsing Table M.
Method:
1. For each production A   of the grammar, do steps 2

and 3.
2. For each terminal a in FIRST(), add A   to M[A, a].
3. If  is in FIRST(), add A   to M[A, b] for each
terminal b in FOLLOW(A). If  is in FIRST() and $ is
in FOLLOW(A), add A   to M[A, $].
4. Make each undefined entry of M be error.
52
Predictive Parsing Working

• The program considers X, the symbols on the top of the stack, and a,
the current input symbol. These two symbols determine the parser
action.
• There are three possibilities:
– If X = a = $, the parser halts and announces successful completion of
parsing.
– If X = a  $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
– If X is a nonterminal, the program consults entry M[X, a] of the parsing
table M. This entry will be either an X-production of the grammar or an
error entry. If for example, M[X, a] = {X  UVW}, the parser replaces
X on top of the stack by WVU (with U on top).
• As output, we shall assume that the parser just prints the production
used; any other code could be executed here.
• If M[X, a] = error, the parser calls an error recovery routine.
53
Moves made by the Nonrecursive
predictive parser
54
LL(1) Grammar
• LL(1) means
– The first “L”: scanning the input from left to
right.
– The second “L”: Leftmost derivation

– “1” stands for Using one input symbol of
lookahead at each step to make parsing action
decisions.
55
LL(1) Grammars are ambiguous

• Consider the Grammar
FIRST (S) = {i, a} FOLLOW(S) = { $ , e }
S  iEtSS’ | a FIRST (S’) = {e, } FOLLOW(S’) = { $, e }
S’ eS |  FIRST (E) = { b } FOLLOW(E) = { t }
Eb
• Parsing Table for this grammar
• A grammar whose parsing table has no multiply-defined entries is said

to be LL(1).
• What should be done when a parsing table has multiply-defined entries?
56
LL(1) Grammar Properties

• No ambiguous or Left recursive grammar can be LL(1).
• A grammar G is LL(1) iff whenever A   |  are two
distinct productions of G. The following conditions hold:
– For no terminal a,
do both  and  derive strings beginning with a.
i.e. FIRST()  FIRST() = 

– At most one of  and  can derive the empty string.
i.e. if  *  then  *  OR if  *  then  * 
– If  * , then  does not derive any string beginning with a
terminal in FOLLOW(A).
i.e. if  *  then
 * 
FIRST()  FOLLOW(A) = 
57
In General, LL(1) Grammar Properties
• A grammar G is LL(1) if for each collections of

productions
A  1 | 2 | … | n
for nonterminal A the following holds:
1. FIRST(i)  FIRST(j) =  for all i  j

2. if i *  then
2.a. j *  for all i  j
2.b. FIRST(j)  FOLLOW(A) = 
for all i  j
58
Non-LL(1) Examples
Grammar Not LL(1) because
SSa|a Left recursive
SaS|a FIRST(a S)  FIRST(a)  
SaR| For R: S *  and R * 

RS|
SaRa For R:
RS| FIRST(S)  FOLLOW(R)  
S  iEtSS’ | a Parsing table generate multiple
S’ eS |  defined entries.
Eb
59
Error Recovery in Predictive Parsing

• Two condition when error detected in
predictive parsing.
– When the terminal on top of the stack does not
match the next input symbol.
– When nonterminal A is on top of the stack, a is

the next input symbol, and the parsing table entry
M[A, a] is empty.
• Following error recovery method can be used.
– Panic-mode error recovery
– Phrase-level error recovery
60
Error Recovery in Predictive

Parsing: Panic-mode
• It is based on the idea of skipping symbols on
the input until a token in a selected set of
synchronizing tokens appears.
• Its effectiveness depends on the choice of

synchronizing set.
• The set should be chosen so that the parser
recovers quickly from errors that are likely to
occur in practice.
61

Parsing: Panic-mode
• Rules
– If the parser looks up entry M[A, a] = blank, then
the input symbol is skipped.
– If the entry is synch, then the nonterminal on top
of the stack is popped in an attempt to resume

parsing OR skip input until FIRST(A) found.
– If a token on top of the stack does not match the
input symbol, then we pop the token from the
stack.
62
Parsing: Panic-mode
• Add synchronizing actions to undefined entries based on
FOLLOW.
• synch: pop A and skip input till synch token OR skip until
FIRST(A) found
Synchronizing tokens added to parsing table.

63
Parsing: Panic-mode
• Erroneous input: ) id * + id
64

Parsing: Phrase-Level
• It is implemented by filling in the blank entries in the
predictive parsing table with pointers to error routines.
• These routines may change, insert, or delete symbols on
the input and issue appropriate error messages.
• They may also pop from the stack.

• In any case, it must be sure that there is no possibility of
an infinite loop.
• Checking that any recovery action eventually results in
an input symbol being consumed (or the stack being
shortened if the end of the input has been reached) . So,
to protect against such loops.
65

Parsing: Phrase-level
• Change input stream by inserting missing *
For example: id id is changed into id * id
Nonterminal
INPUT SYMBOL
id + * ( ) $
E E  T E’ E  T E’ synch synch
E’ E’  + T E’ E’   E’  
T T  F T’ synch T  F T’ synch Synch
T’ insert * T’   T’  * F T’ T’   T’  
F F  id synch synch F(E) synch Synch
insert *: insert missing * and redo the production

66
Error Recovery in Predictive Parsing: Phrase-level

Error Productions
E  T E’ Add error production:
E’  + T E’ |  T’ F T’
T  F T’ to ignore missing *, e.g.: id id
T’  * F T’ | 
F  ( E ) | id
Nonterminal
INPUT SYMBOL
id + * ( ) $
E E  T E’ E  T E’ synch synch
E’ E’  + T E’ E’   E’  
T T  F T’ synch T  F T’ synch Synch
T’ T’ F T’ T’   T’  * F T’ T’   T’  
F F  id synch synch F(E) synch Synch
67
Error Recovery in Predictive Parsing: Phrase-level

Error Productions
• Erroneous input: id id
Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing :
Shift Reduce Parsing and
Operator Precedence Parsing

Assistant Professor
2
Outline
• Bottom-Up Parsing
– Shift Reduce Parsing
– Operator Precedence Parsing.
– LR parsers: (next Presentation)
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
3
Bottom-Up Parsing
• Start at the leaves and grow toward root.
• We can think of the process as reducing the input
string to the start symbol.
• At each reduction step a particular substring matching
the right-side of a production is replaced by the

symbol on the left-side of the production.
• Bottom-up parsers handle a large class of grammars.
4
Bottom-Up Parsing
• A general style of bottom-up syntax analysis, known as
shift-reduce parsing.
• Main actions are shift and reduce.
• At each shift action, the current symbol in the input
string is pushed to a stack.
• At each reduction step, the symbols at the top of the

stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the
left side of that production.
• There are also two more actions: accept and error.
5
Shift – Reduce Parsing

• “Shift-Reduce” Parsing
• Reduce a string to the start symbol of the grammar.
• At every step a particular sub-string is matched (in left-to-right
fashion) to the right side of some production and then it is
substituted by the non-terminal in the left hand side of the
production.
Consider: Reverse Order

abbcde
S  aABe
aAbcde
A  Abc | b
aAde
Bd
aABe
S
Rightmost Derivation:
S  aABe  aAde  aAbcde  abbcde
6
Shift-Reduce Parsing
Grammar: Reducing a sentence: Shift-reduce corresponds

SaABe abbcde to a rightmost derivation:
AAbc|b aAbcde S rm a A B e
Bd aAde rm a A d e
aABe rm a A b c d e
These match S rm a b b c d e

production’s
right-hand sides
S
A A A
A A A B A B
a b b c d e a b b c d e a b b c d e a b b c d e
7
Handles
A handle is a substring of grammar symbols in a right-
sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
AAbc|b aAde Handle

Bd aABe
S
abbcde
aAbcde NOT a handle, because
aAAe further reductions will fail
…? (result is not a sentential form)
8
Handles
• A handle of a right sentential form  ( ) is a
production rule A   and a position of  where the string 
may be found and replaced by A to produce the previous right-
sentential form in a rightmost derivation of .
S  A  
i.e. A   is a handle of  at the location immediately after
the end of ,
• If the grammar is unambiguous, then every right-sentential
form of the grammar has exactly one handle.
•  is a string of terminals
9
Handle Pruning
• The process of discovering a handle & reducing it to
the appropriate left-hand side is called handle
pruning. Handle pruning forms the basis for a
bottom-up parsing method.
• To construct a rightmost derivation

S = 0  1  2  ...  n-1  n=  input string
Reduction made by Shift-Reduce Parser

10
Shift – Reduce Parser

• There are four possible actions of a shift-parser
action:
– Shift : The next input symbol is shifted onto the top of the
stack.
– Reduce: Replace the handle on the top of the stack by the
non-terminal.
– Accept: Successful completion of parsing.
– Error: Parser discovers a syntax error, and calls an error
recovery routine.
11
Stack Implementation of Shift – Reduce Parser
• Initial State
STACK INPUT
$ W$
• Final State
STACK INPUT
$S $
12
Stack Implementation of
Shift-Reduce Parsing
Stack Input Action

$ id+id*id$ shift
$id +id*id$ reduce E  id How to
Grammar: $E +id*id$ shift
resolve
EE+E $E+ id*id$ shift

$E+id *id$ reduce E  id conflicts?
EE*E $E+E *id$ shift (or reduce?)
E(E) $E+E* id$ shift
E  id $E+E*id $ reduce E  id
$E+E*E $ reduce E  E * E
$E+E $ reduce E  E + E
Find handles $E $ accept
to reduce
13
Justifies the use of stack in shift –

reduce parsing
• The handle will always appear on top of the
stack, never inside.
• Consider the possible forms of two successive
steps in any rightmost derivation.

• These steps can be of the form:
1. S *rm Az *rm Byz *rm yz
2. S *rm BxAz *rm Bxyz *rm xyz
14

reduce parsing
• Let consider case (1) in reverse, where a shift –
reduce parser just reached the configuration
STACK INPUT
$ yz$ handle identified

$B yz$ reduce
$By z$ handle identified
$A z$ reduce
15

reduce parsing (contd…)
• Case (2) configuration,
STACK INPUT
$ xyz$ handle identified
$B xyz$ reduce
$Bx yz$ shift
$Bxy z$ handle identified
$BxA z$ reduce
Note: It never had to go into the stack to find the handle. It is this
aspect of handle pruning that makes a stack a particularly
convenient data structure to implementing a shift reduce parser.
16
Conflicts
• Shift-reduce and reduce-reduce conflicts are
caused by
– The limitations of the LR parsing method (even
when the grammar is unambiguous)
– Ambiguity of the grammar
17
Shift-Reduce Parsing:
Shift-Reduce Conflicts
Stack Input Action

$… …$ …
$…if E then S else…$ shift or reduce?
Ambiguous grammar:
S  if E then S
| if E then S else S
| other
Resolve in favor
of shift, so else
matches closest if
18
Shift-Reduce Parsing:
Reduce-Reduce Conflicts
Stack Input Action

$ aa$ shift
$a a$ reduce A  a or B  a ?
Grammar:
CAB
Aa
Ba
Resolve in favor
of reduce A  a,
otherwise we’re stuck!
19
Operator – Precedence Parsing

• Operator grammar is a small, but an important class of
grammars.
• It easily construct an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
• Properties: In an operator grammar, no production rule
can have:
–  at the right side
– two adjacent non-terminals at the right side.
• Example
E → E + E | E * E | ( E ) | −E | id
20

• Disadvantage
– Hard to handle tokens like the minus sign, which
has two different precedences (depending on
whether it is unary or binary)
– Worse, since the relationship between a grammar
for the language being parsed and the operator-

precedence parser itself is tenuous, one cannot
always be sure the parser accepts exactly the
desired language.
– Only a small class of grammars can be parsed.
21

• In operator-precedence parsing, we define three
disjoint precedence relations between certain pairs
of terminals.
• Relations Meaning
a⋖b b has higher precedence than a
a≐b b has same precedence as a

a⋗b b has lower precedence than a
• There are two common ways of determining what

precedence relations should hold between a pair of
terminals.
22

• Two Common ways -
– First method is intuitive and is based on the traditional
notions of associativity and precedence of operators.
(Unary minus causes a problem).
• Example: If * has higher precedence than + than the
relationship shown as + ⋖ * and * ⋗ +
– Second method of selecting operator-precedence

relationship is first to construct an unambiguous grammar
for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.
• Example: the dangling else grammar.
23
Using Operator – Precedence Relations

• The intention of the precedence relations is to
find the handle of a right-sentential form,
⋖ with marking the left end,
≐ appearing in the interior of the handle, and
⋗ marking the right hand.

• In our input string $0a11a2... n-1ann$, we
insert the precedence relation between the
pairs of terminals (the precedence relation
holds between the terminals in that pair).
24
Using Operator – Precedence Relations

• Consider the grammar id + * $
E → E + E | E * E | id id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
• Operator – Precedence Relations
* ⋖ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖
• Then the input string id + id * id with the precedence

relations inserted will be:
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $
25
Using Operator – Precedence

Relations: To find handle
• The handle can be found by the following process:
1. Scan the string from left end until the first ⋗ is encountered.
2. Then scan backwards (to the left) over any ≐ until a ⋖ is
encountered.
3. The handle contains everything to left of the first ⋗ and to the
right of the ⋖ is encountered in step (2), including any
intervening or surrounding non-terminals.
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ id + id * id $
$ ⋖ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ E + id * id $
$ ⋖ + ⋖ * ⋖ id .⋗ $ E  id $ E + E * id $
$⋖ +⋖ *⋗ $ EE*E $E+ E*E$
$⋖ +⋗ $ EE+E $E+E$
$$ $E$
26
Using Operator – Precedence

Relations: LEADING and TRAILING
EE+T|T
TT*F|F
F  ( E ) | id
• LEADING(A) and TRAILING(A) for each

nonterminal A, defined by
– LEADING(A) = { a | A + a, where  is  or a single
nonterminal }
– TRAILING(A) = { a | A + a, where  is  or a single
nonterminal }
27
LEADING and TRAILING Algorithm

• LEADING(A) Algorithm
– a is in LEADING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in LEADING(B), and there is a production of the
form A → B, then a is in LEADING(A).
• TRAILING(A) Algorithm
– a is in TRAILING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in TRAILING(B), and there is a production of the
form A → B, then a is in TRAILING(A).
28
LEADING and TRAILING

• The LEADING and TRAILING terminals for the
previous grammar.
• NONTERMINAL LEADING TRAILING
E * , + , ( , id * , + , ) , id
T * , ( , id * , ) , id
F ( , id ) , id
29
Computing Operator - Precedence Relations

Input: An operator grammar G.
Output: The relations ⋖ , ≐ , and ⋗ for G.
Method:
1.Compute LEADING(A) and TRAILING(A) for each
nonterminal A.
2.Execute the algorithm, examining each position of the right

side of each production.
3.Set $ ⋖ a for all a in LEADING(S) and set b ⋗ $ for all b
in TRAILING(S), where S is the start symbol of G.
for each production A → X1X2…Xn do
for i ≔ 1 to n – 1 do
30
Computing Operator - Precedence Relations

(Contd…)
begin
if Xi and Xi+1 are both terminals then set Xi ≐ Xi+1 ;
if i ≤ (n − 2) and Xi and Xi+2 are terminals
and Xi+1 is a nonterminal then
set Xi ≐ Xi+2 ;
if Xi is a terminal and Xi+1 is a nonterminal then
for all a in LEADING(Xi+1) do set Xi ⋖ a ;
if Xi is a nonterminal and Xi+1 is a terminal then
for all a in TRAILNG(Xi) do set a ⋗ Xi+1 ;
end
31
Operator – Precedence Relations

The grammar
EE+T|T $⋖a
TT*F|F
F  ( E ) | id
right part or g()
+ * ( ) id $
left part or f()
EE+T
+ ⋗ ⋖ ⋖ ⋗ ⋖ ⋗
* ⋗ ⋗ ⋖ ⋗ ⋖ ⋗
Xi Xi+1 Xi+2 ( ⋖ ⋖ ⋖ ≐ ⋖
Xi Xi+1 ) ⋗ ⋗ ⋗ ⋗
id ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖
32
Operator – Precedence Parsing Algorithm

Input: An input string w and a table of precedence relations.
Output: If w is well formed, a skeletal parse tree, with a

placeholder nonterminal E labeling all interior nodes; otherwise,
an error indication.
Method: Initially, the stack contains $ and the input buffer the
string w$.
set ip to point to the first symbol of w$ ;

repeat forever
if $ is on top of the stack and ip points to $ then
accept and return
else
33
Operator – Precedence Parsing Algorithm

(Contd…)
begin
let a be the topmost terminal symbol on the stack
and let b be the current symbol pointed to by ip ;
if a ⋖ b or a ≐ b then /* shift */
begin
shift/push b onto the stack ;
advance ip to the next input symbol;

end ;
else if a ⋗ b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by ⋖
to the terminal most recently popped
else call error recovery routine or error()
end
34
Moves made by Operator – Precedence

Parsing
STACK Precedence INPUT ACTION
$ ⋖ id + id * id$ shift
id + * $
$ id ⋗ + id * id$ reduce E  id
$ ⋖ + id * id$ shift id ⋗ ⋗ ⋗
$+ ⋖ id * id$ shift + ⋖ ⋗ ⋖ ⋗
$ + id ⋗ * id$ reduce E  id * ⋖ ⋗ ⋗ ⋗
$+ ⋖ * id$ shift
$ ⋖ ⋖ ⋖
$+* ⋖ id$ shift
$ + * id ⋗ $ reduce E  id
$+* ⋗ $ reduce E  E * E
$+ ⋗ $ reduce E  E + E
$ $ Accept
35
Actions of Operator – Precedence Parsing

input string: id + id
Stack Input Stack Input
$ id + id $ $ id + id $
id
(a) (b)
$● + id $ $●+ id $
● ● +
id id
(c) (d)
Actions of Operator – Precedence Parsing
36
input string: id + id
Stack Input Stack Input

$ ● + id $ $●+● $
● + id ● + ●
id (e) id id (f)
Stack Input
$● $
● + ●
id id (g)
37
Operator-Precedence Relations from

Associativity and Precedence
Consider the grammar for the arithmetic expression
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
Note: Grammar is ambiguous, and right-sentential
forms could have many handles.
1. If operator 1 has higher precedence than operator 2,

make
1 ⋗ 2 and 2 ⋖ 1
Example: * and +, input: E + E * E + E
38
Associativity and Precedence (Contd…)
2. If operator 1 and operator 2 have equal precedence,
they are left-associative, make 1 ⋗ 2 and 2 ⋗ 1
they are right-associative, make 1 ⋖ 2 and 2 ⋖ 1
Example: left-associative input: E  E + E
+ ⋗ + , + ⋗  ,  ⋗  , and  ⋗ +
right-associative ↑ ⋖ ↑
3. For all operators ,

 ⋖ id , id ⋗  ,  ⋖ ( , ( ⋖  ,
( ⋖  ,  ⋗ ) , ) ⋗  ,  ⋗ $, and $ ⋖ 
Also, let
(≐ ) $ ⋖( $ ⋖ id
(⋖ ( id ⋗ $ ) ⋗$
( ⋖ id id ⋗ ) ) ⋗)
39

Assuming,
• ↑ is of highest precedence and right associative.
• * and / are of next highest precedence and left –

associative, and
• + and – are of lowest precedence and left –
associative
Input: id * ( id ↑ id ) – id / id
Try input string with the table in next slide.
40

+ − * / ↑ id )( $
+ ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
− ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
* ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
/ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
↑ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
id ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
( ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ≐ error
) ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖
41
Handling Unary Operators

• Operator-Precedence parsing cannot handle the unary minus
when we also have the binary minus in our grammar.
• The best approach to solve this problem, let the lexical
analyzer handle this problem.
– The lexical analyzer will return two different tokens for the unary minus
and the binary minus.
– The lexical analyzer will need a lookhead to distinguish the binary
minus from the unary minus.

• Then, we make
 ⋖ unary-minus for any operator
unary-minus ⋗  if unary-minus has higher precedence than 
unary-minus ⋖  if unary-minus has lower (or equal)
precedence than 
42
Precedence Functions
• Compilers using operator precedence parsers do not need to
store the table of precedence relations.
• The table can be encoded by two precedence functions f and g
that map terminal symbols to integers.
• For symbols a and b.
f(a) < g(b) whenever a ⋖ b
f(a) = g(b) whenever a ≐ b

f(a) > g(b) whenever a ⋗ b
• The precedence relation between a and b can be determined by
a numerical comparison between f(a) and g(b).
• Note: Error entries in the precedence matrix are obscured,
since one of (1), (2), or (3) holds no matter what f(a) and g(b)
are.
43
+ − * / ↑ ( ) id $
f 2 2 4 4 4 0 6 6 0
g 1 1 3 3 5 5 0 5 0
For example:
* ⋖ id, and f(*) < g(id)
Note: f(id) > g(id) suggests that id ⋗ id;
In fact no precedence relation holds between id and id.
44
Constructing Precedence Functions

Input: An operator precedence matrix
Output: Precedence function representing the input matrix,
or an indication the none exist.
Method
1.Create symbols fa and ga for each a that is a terminal or $.

2.Partition the created symbols into as many groups as
possible, in such a way that if a ≐ b, then fa and gb are in the
same group.
45
Constructing Precedence Functions

3. Create a directed graph whose nodes are the groups
found in (2). For any a and b,
• If a ⋖ b, place an edge from the group of gb to the group of fa.
• If a ⋗ b, place an edge from the group of fa to that of gb.
4. If the graph constructed in (3) has a cycle, then no

precedence functions exist. If there are no cycle, let

f(a) be the length of the longest path beginning at the
group of fa; let g(a) be the length of the longest path
beginning at the group of ga.
46
Example
+ +
Graph representing precedence functions
+ * id $
f 2 4 4 0
g 1 3 5 0
47
Error Recovery in Operator-

Precedence Parsing
• Parser can discover syntactic errors:
– If no precedence relation holds between the
terminal on top of the stack and the current
input.
– If a handle has been found, but there is no
production with this handle as a right side.
48
Handling Errors during Reductions

• The error checker for reductions need only check that the proper
set of nonterminal makers appears among the terminal strings
being reduced.
• Specifically, the checker does the following:
– If + , − , * , / , or ↑ is reduced, it checks that nonterminals appear on both
sides. If not, it issues the diagnostic

missing operand
– If id is reduced, it checks that there is no terminal to the right or left. If
there is, it can warn
missing operator
– If ( ) is reduced, it checks that there is a nonterminal between the
parentheses. If not, it can say
no expression between parentheses
id ( ) $ 49
id e3 e3 ⋗ ⋗
⋖ ⋖ ≐
Handling Shift/Reduce Errors: ( e4
) e3 e3 ⋗ ⋗
Error handling routines $ ⋖ ⋖ e2 e1
• e1: /* called when whole expression is missing */

– insert id onto the input
– issue diagnostic: “missing operand”
• e2: /* called when expression begins with a right parenthesis */
– delete ) from the input

– issue diagnostic: “unbalanced right parenthesis”
• e3: /* called when id or ) is followed by id or ( */
– insert + onto the input
– issue diagnostic: “missing operator”
• e4: /* called when expression ends with a left parenthesis */
– pop ( from stack
– issue diagnostic: “missing right parenthesis”
50
Error Handling Mechanism
Erroneous input: id id ) ( $ Original String: ( id + id )
STACK Precedence INPUT ACTION

$ ⋖ id id )($ shift
$ id blank id )($ error, e3. missing operator
$ id + id )($ insert ‘+’ in INPUT
$ id ⋗ + id )($ reduce
$ ⋖ + id )($ shift
$+ ⋖ id )($ shift
$ + id ⋗ )($ reduce
$+ ⋗ )($ reduce
$ blank )($ error, e2. unbalanced right parenthesis
$ ($ delete ‘)’from the INPUT
$ ⋖ ($ shift
$( blank $ error, e4. missing right parenthesis
$ $ pop ‘(’from STACK
$ $ accept
1
Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing : LR parsers

Assistant Professor
2
Outline
• Bottom-Up Parsing
– LR parsers:
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
3
LR Parsers
• Efficient bottom-up syntax analysis technique
that can used to parse a large class of CFG.
• The technique is called LR(k) parsing; ‘L’ is
for Left-to-Right scanning of the input, ‘R’ for
constructing a rightmost derivation in reverse,

and the k for the number of input symbols of
lookahead that are used in making parsing
decisions.
• When (k) is omitted, k is assumed to be 1.
4
LR Parsers: Attractive
• LR parsing is attractive for variety of reasons.
– LR parsers can be constructed to recognize virtually all
programming language constructs for which context-free
grammars can be written.
– The LR-parsing method is the most general nonbacktracking
shift-reduce parsing method known, yet it can be
implemented as efficiently as other, more primitive shift-

reduce methods.
– An LR parser can detect a syntactic error as soon as it is
possible to do so on a left-to-right scan of the input.
– The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive or LL methods.
5
LR Parsers: Drawback
• The principal drawback of the LR method is that it is
too much work to construct an LR parser by hand for a
typical programming-language grammar.
• A specialized tool, an LR parser generator, is needed.
• YACC: Such a generator takes a context-free grammar
and automatically produces a parser for that grammar.

– If the grammar contains ambiguities or other constructs that
are difficult to parse in a left-to-right scan of the input, then
the parser generator locates these constructs and provides
detailed diagnostic messages.
6
Three Techniques for creating LR

Parsing Table
• Simple LR (in short SLR) – It is easiest to
implement, but least powerful as It may fail to
produce a paring table for certain grammars.
• Canonical LR – It is most powerful and most
expensive.
• Lookahead LR (in short LALR) – It is intermediate
in power and cost between other two. It will work on
most programming-language grammars, and with
some effort, implemented efficiently.
• Powerful: Canonical LR > LALR > SLR
7
LR(0) Items of a Grammar

• An LR(0) item of a grammar G is a production of G
with a • at some position of the right-hand side
• Thus, a production
AXYZ
has four items:
[A  • X Y Z]
[A  X • Y Z]
[A  X Y • Z]
[A  X Y Z •]
• Note that production A   has one item [A  •]
8
Constructing the set of LR(0) Items of

a Grammar
• If G is a grammar with start symbol S, then G’, the
augmented grammar for G i.e. G with a new start
symbol S’ and production S’S.
The purpose of new starting production is to indicate to the parser when it should
stop parsing and announce acceptance of the input i.e. acceptance occurs when
and only when the parser is about to reduce by S’ S.
• The Closure Operation

9
Example
• Consider the Grammar G,
F
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id}
FOLLOW(E) = { $ , ) , + }
FOLLOW(T) = { * , $ , ) , + }
FOLLOW(F) = { * , $ , ) , + }
• Create augmented expression grammar G’
F
• If I is the set of one item {[E’ → . E}, then closure(I) contains
the items.
10
The Closure Operation (Example)

Final Closure I0
Items
The Goto Operation for LR(0)
11
12
The Sets-of-Items Construction

Kernel and Nonkernel Items

13
LR(0) collection for Grammar
id
I0  I5
+
I1  I6
E *
I0  I1 I2  I7
T E F
I4  I8 I4  I3
I0  I2
( id
I4  I4 I4  I5
F
I0  I3 T F
I6  I9 I6  I3
( id
I6  I4 I6  I5
(
I0  I4 F ( id
I7  I10 I7  I4 I7  I5
)
I8  I11 *
+ I9  I7
I8  I6
14
Transition diagram for the grammar G
represent Goto operation
Transition diagram of DFA D for viable prefixes.

Constructing SLR parser table

15
expression grammar G
Parsing table SLR(1) for
16
17
Parse table Action and Goto function

• Parsing table shows the parsing Action and Goto functions for the
following grammar G, repeated here with the production numbered.
Model of an LR Parser
18
LR Parsing Algorithm
19
20
Example of LR parsing
Moves of LR parser on id * id + id
21
Another Example – SLR parsing

• Consider the following grammar G
(1) S  S ( S )
(2) S  
• Augmented Grammar G’
S’  S
S S(S)
S 
• FIRST(S) = { ( , }
• FOLLOW(S) = { $ , ( , ) }
Closure Set I
Construction of LR(0) items:
22
23
SLR(1) Parsing Table and Parsing
Now, the moves of LR parser on parsing the input string ( ) ( )

24
LL vs. LR Grammars
• For a grammar to be LR(k), we must be able to
recognize the occurrence of the right side of a
production in a right-sentential form, with k input
symbols of lookahead.
• This requirement is far less stringent than that for
LL(k) grammars where we must be able to recognize

the use of a production seeing only the first k symbols
of what its right side derives.
• Thus, it should not be surprising that LR grammars
can describe more languages than LL grammars.
25
Unambiguous grammars that are

not SLR(l)
• Every SLR(1) grammar is unambiguous, but
there are many unambiguous grammars that
are not SLR(1).
• Consider the grammar X with productions:
FIRST (S) = FIRST(L) / FIRST (R) = { * , id}

FOLLOW(S) = {$} , FOLLOW(L) = { = , $}
FOLLOW(R) = {$ , = }
26
Canonical LR(0) collection for

Grammar X id
I0  I5
=
I2  I6
S R *
I0  I1 I4  I7 I4  I4
L id
I4  I8 I4  I5
L
I0  I2
R
I6  I9
R
I0  I3 L
I6  I8
*
I0  I4 *
I6  I4
id
I6  I5
27
Parsing table SLR(1) for expression
grammar X shows S-R conflict
Conflict
Grammar X,
(1) S  L = R
(2) S  R
(3) L  * R
(4) L  id
(5) R  L
Grammar X is not ambiguous. This shift/reduce conflict arises from the fact that the
SLR parser construction method is not powerful enough to remember enough left
context to decide what action the parser should take on input =, having seen a string
reducible to L.
28
LR(1) Grammars
• SLR too simple

• LR(1) parsing uses lookahead to avoid
unnecessary conflicts in parsing table
• LR(1) item = LR(0) item + lookahead
LR(0) item: LR(1) item:

[A•] [A•, a]
29
SLR Versus LR(1)

• Split the SLR states by
adding LR(1) lookahead
• Unambiguous grammar
SL=R|R
L  * R | id
RL
30
LR(1) Items
• An LR(1) item
[A•, a]
contains a lookahead terminal a, meaning  already
on top of the stack, expect to see a
• For items of the form
[A•, a]
the lookahead a is used to reduce A only if the

next input is a
• For items of the form
[A•, a]
with  the lookahead has no effect
Constructing LR( l ) Sets of Items

31
32
Constructing LR( l ) Sets of Items

• Unambiguous LR(1) grammar:
(1) S  L = R
(2) S  R
(3) L  * R
(4) L  id
(5) R  L
• Augment with S’  S
• LR(1) items (next slide) lookahead / Second Component
A•B , a
Core / First Component
FIRST(a)
33
Construct Closure I0
Core lookahead
FIRST(a)
I0 rewrite as:
34
Constructing LR( l ) Sets of Items for grammar X

Construction of canonical-LR parsing tables.

35
grammar X
Canonical LR(1) parsing table for
36
37
LALR(1) Grammars
• LR(1) parsing tables have many states
• LALR(1) parsing (Look-Ahead LR) combines LR(1)
states to reduce table size
• Less powerful than LR(1)
– Will not introduce shift-reduce conflicts, because shifts do
not use lookaheads

– May introduce reduce-reduce conflicts, but seldom do so
for grammars of programming languages
38
Merging of states with common core may produce a

reduce-reduce conflict, but not shift-reduce conflict
39
Constructing LALR(1) Parsing

Tables
1. Construct sets of LR(1) items
2. Combine LR(1) sets with sets of items that share
the same first component or Core, I4 = I11
I4: [L  *•R, =]
[R  •L, =] Shorthand
[L  •*R, =] for two items

[L  *•R, =/$]
[L  •id, =]
[R  •L, =/$] in the same set
[L  •*R, =/$]
I11: [L  *•R, $]
[L  •id, =/$]
[R  •L, $]
[L  •*R, $]
[L  •id, $]
Similarly, following closures share same core or first component.
I5 = I12 , I7 = I13 , I8 = I10
LALR parsing table construction

40
grammar X
LALR(1) parsing table for
41
42
Analysis
• The LR and LALR parsers will mimic one
another on correct inputs.
• When presented with erroneous input, the
LALR parser may proceed to do some
reductions after the LR parser has declared an

error. However, the LALR parser will never
shift another symbol after the LR parser
declares an error.
43
Another Example – canonical LR

and LALR
(1) S  S ( S )
(2) S  
S’  S
S S(S)
S 
• LR(1) items (next slide)
LR(1) items for grammar G

44
Canonical LR parsing table

45
46
LALR(1) Parsing Table

• Following closures share same core or first
component of LR(1) items. I4 = I7 , I3 = I6 , I2 = I5
• Therefore, Union of States results
I4  I7 , I3  I6 , I2  I5
Parsing canonical LR
47
LALR parsing
48
49
Another Example – canonical LR

and LALR
(1) S  CC
(2) C  cC FIRST(S) = FIRST(C)
(3) C  d ={c,d}
FOLLOW(S) = { $ }
FOLLOW(S) = { $ , c , d }
S’  S
S  CC
C  cC
Cd
• LR(1) items (next slide)
LR(1) items for grammar G

50
Canonical LR parsing table

51
52

• Following closures share same core or first component of
LR(1) items. I3 = I6 , I4 = I7 , I8 = I9
• Therefore, Union of States results
I3  I6 , I4  I7 , I8  I9
I36: C  cC , c | d | $
C  cC , c | d | $
C  d , c | d | $
I47: C  d , c | d | $
I89: C  cC , c | d | $

53
54
LL, SLR, LR, LALR Summary

• LL parse tables computed using FIRST/FOLLOW
– Nonterminals  terminals  productions
– Computed using FIRST/FOLLOW
• LR parsing tables computed using closure/goto
– LR states  terminals  shift/reduce actions
– LR states  terminals  goto state transitions

• A grammar is
– LL(1) if its LL(1) parse table has no conflicts
– SLR if its SLR parse table has no conflicts
– LALR(1) if its LALR(1) parse table has no conflicts
– LR(1) if its LR(1) parse table has no conflicts
LL, SLR, LR, LALR Grammars

55
56
Assignment
Consider the following grammar G,
EE+T|T
T  TF | F
F  F* | a | b
i. Construct the sets of LR(0) items.

ii. Construct the SLR(1) parsing table for this grammar.
iii. Construct the sets of LR(1) items.
iv. Construct Canonical LR(1) parsing table.
v. Construct the LALR parsing table.

Compiler Unit-2 Partha Sir

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Unit-2 Partha Sir

Uploaded by

Copyright:

Available Formats

Unit - II

Partha Sarathi Chakraborty

• From certain class of grammars we can automatically construct an

Backus-Naur Form (BNF)

• Every rule in Backus-Naur form has the following

Backus-Naur Form (BNF)

symbols and non-terminal symbols, joined together

Backus-Naur Form (BNF)

<term> ::= <factor> "*" <term>

Backus-Naur Form (BNF)

expansion → expansion expansion

Position of a Parser in the Compiler Model

Get next front-end

the source program

without consuming unnecessary input

Error Recovery Strategies

– Perform local correction on the input to repair the error

– P is a finite set of productions of the form

Notational Conventions Used

specific nonterminals: expr, term, stmt

–  is leftmost lm if  does not contain a nonterminal

Chomsky Hierarchy: Language Classification

L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Where L(T) = { L(G) | G is of type T }

• Two distinct left derivation

Ambiguity: Two Parse trees

• Grammar is ambiguous since the string

has two parse trees

or it is any other kind of unconditional statement.

• Top-down parsing methods cannot handle

Rewrite every left-recursive production

into a right-recursive production:

• Eliminate immediate left recursion (Non-terminal E and T

General Left Recursion

Example Left Rec. Elimination

• In general, Replace productions

• Left factored, this grammar becomes:

• Top-down (C-F grammar with restrictions)

Recursive Descent Parsing

is implemented in a branch of a selection statement based on

• Note: A left-recursive grammar can cause a recursive-decent parser, even

It is possible for recursive-decent

left-recursive production right-recursive production:

Advantage and Limitations of

• Compute FIRST and FOLLOW

Transition Diagrams for Predictive Parsers

FIRST and FOLLOW

• FOLLOW: it is defined as for nonterminal A i. e.

– If X is nonterminal and XY1Y2Yk is a

After applying FIRST rules over the grammar

– Place $ in FOLLOW(S), where S is the start symbol

After applying FIRST rules over the grammar

Usefulness of FIRST and FOLLOW

Another Example of FIRST and FOLLOW

Predictive Parsing Table

1. For each production A   of the grammar, do steps 2

Predictive Parsing Working

– The second “L”: Leftmost derivation

LL(1) Grammars are ambiguous

• A grammar whose parsing table has no multiply-defined entries is said

LL(1) Grammar Properties

i.e. FIRST()  FIRST() = 

In General, LL(1) Grammar Properties

• A grammar G is LL(1) if for each collections of

insert : insert missing and redo the production