You are on page 1of 173

Unit - II

Chapter 4
Syntax Analysis
Top – Down Parsing

Partha Sarathi Chakraborty


Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2

Outline
• Role of the parser
• Top-Down parsing:
–Predictive Parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Recursive, and
• Nonrecursive
3

Introduction
• The syntax of the programming language constructs can be
described by context free grammars or BNF (Backnus-Naur Form).
• Grammar offers significant advantage to both language designer
and compiler writers.
• A grammar gives precise, yet easy-to understand, syntactic
specification of a programming language.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• From certain class of grammars we can automatically construct an


efficient parser that determines if a source program is syntactically
well formed.
• A properly designed grammar imparts a structure to a programming
language that is useful for the translation of source program into
correct object code and for the detection of errors.
• Languages evolve over a period of time, acquiring new constructs
and performing additional tasks.
4

Backus-Naur Form (BNF)


• Backus-Naur form (BNF) is a formal notation for
encoding grammars intended for human consumption.
• Many programming languages, protocols or formats
have a BNF description in their specification.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Every rule in Backus-Naur form has the following


structure:
name ::= expansion
• The symbol ‘::=’ means "may expand into" and "may
be replaced with."
• a name is also called a non-terminal symbol.
5

Backus-Naur Form (BNF)


• Every name in Backus-Naur form is surrounded by
angle brackets, < >, whether it appears on the left- or
right-hand side of the rule.
• An expansion is an expression containing terminal
(C) 2014, Prepared by Partha Sarathi Chakraborty

symbols and non-terminal symbols, joined together


by sequencing and choice.
• A terminal symbol is a literal like ("+" or "function")
or a class of literals (like integer).
• Simply juxtaposing expressions indicates sequencing.
• A vertical bar ‘|’ indicates choice.
6

Backus-Naur Form (BNF)


• For example, in BNF, the classic expression
grammar is:
<expr> ::= <term> "+" <expr>
| <term>
(C) 2014, Prepared by Partha Sarathi Chakraborty

<term> ::= <factor> "*" <term>


| <factor>
<factor> ::= "(" <expr> ")“
| <const>
<const> ::= integer
7

Backus-Naur Form (BNF)


• Naturally, we can define a grammar for rules
in BNF:
rule → name ::= expansion
name → < identifier >
(C) 2014, Prepared by Partha Sarathi Chakraborty

expansion → expansion expansion


expansion → expansion | expansion
expansion → name
expansion → terminal
8

Position of a Parser in the Compiler Model

Token,
Source tokenval Parser
Lexical Intermediate
Program and rest of
Analyzer representation
(C) 2014, Prepared by Partha Sarathi Chakraborty

Get next front-end


token
Lexical error Syntax error
Semantic error

Symbol Table
9

The Parser
• The task of the parser is to check syntax
• The syntax-directed translation stage in the
compiler’s front-end checks static semantics and
produces an intermediate representation (IR) of
(C) 2014, Prepared by Partha Sarathi Chakraborty

the source program


– Abstract syntax trees (ASTs)
– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer lists
– WHIRL (SGI Pro64 compiler) has 5 IR levels!
10

Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover and
continue
• Example: misspelling identifier, keyword or operator
– Syntax errors: most important for compiler, can almost
(C) 2014, Prepared by Partha Sarathi Chakraborty

always recover
• Example: an arithmetic expression with unbalanced parenthesis
– Static semantic errors: important, can sometimes recover
– Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
• Example for semantic error: an operator applied to an incompatible
operand.
– Logical errors: hard or impossible to detect
• Example: an infinitely recursive call.
11

Error Handling
• The error handler in a parser has simple-to-
state goals:
– It should report the presence of errors clearly and
(C) 2014, Prepared by Partha Sarathi Chakraborty

accurately.
– It should recover from each error quickly enough
to be able to detect subsequent errors.
– It should not significantly slow down the
processing of correct programs.
12

Viable-Prefix Property
• The viable-prefix property of LL/LR parsers
allows early detection of syntax errors
– Goal: detection of an error as soon as possible
(C) 2014, Prepared by Partha Sarathi Chakraborty

without consuming unnecessary input


– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in the
language
Error is
Error is detected here
… detected here …
Prefix Prefix DO 10 I = 1;0
for (;)
… …
13

Error Recovery Strategies


• Panic mode
– Discard input until a token in a set of designated
synchronizing tokens is found
• Phrase-level recovery
(C) 2014, Prepared by Partha Sarathi Chakraborty

– Perform local correction on the input to repair the error


• Error productions
– Augment grammar with productions for erroneous
constructs
• Global correction
– Choose a minimal sequence of changes to obtain a
global least-cost correction
14

Grammars
• Context-free grammar is a 4-tuple
G=(N,T,P,S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
(C) 2014, Prepared by Partha Sarathi Chakraborty

– P is a finite set of productions of the form



where   (NT)* N (NT)*
and   (NT)*
– S is a designated start symbol S  N
15

Notational Conventions Used


• Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +
• Nonterminals
A,B,C,…  N
(C) 2014, Prepared by Partha Sarathi Chakraborty

specific nonterminals: expr, term, stmt


• Grammar symbols
X,Y,Z  (NT)
• Strings of terminals
u,v,w,x,y,z  T*
• Strings of grammar symbols
,,  (NT)*
16

Derivations
• The one-step derivation is defined by
A
where A   is a production in the grammar
• In addition, we define
(C) 2014, Prepared by Partha Sarathi Chakraborty

–  is leftmost lm if  does not contain a nonterminal


–  is rightmost rm if  does not contain a nonterminal
– Transitive closure * (zero or more steps)
– Positive closure + (one or more steps)
• The language generated by G is defined by
L(G) = {w | S + w}
17

Derivation (Example)
EE+E
EE*E
E(E)
E-E
E  id
(C) 2014, Prepared by Partha Sarathi Chakraborty

E  - E  - id
E rm E + E rm E + id rm id + id
E * E
E + id * id + id
18

Chomsky Hierarchy: Language Classification


• A grammar G is said to be
– Regular if it is right linear where each production is of
the form
AwB or Aw
or left linear where each production is of the form
(C) 2014, Prepared by Partha Sarathi Chakraborty

ABw or Aw
– Context free if each production is of the form
A
where A  N and   (NT)*
– Context sensitive if each production is of the form
A
where A  N, ,,  (NT)*, || > 0
– Unrestricted
19

Chomsky Hierarchy

L(regular)  L(context free)  L(context sensitive)  L(unrestricted)


(C) 2014, Prepared by Partha Sarathi Chakraborty

Where L(T) = { L(G) | G is of type T }


That is, the set of all languages
generated by grammars G of type T

Examples:
Every finite language is regular
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive
20

Ambiguity
• A grammar that produces more than one
parse tree for some sentence is said to be
ambiguous.
• Example: id + id * id
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Two distinct left derivation


EE+E EE*E
 id + E E+E*E
 id + E * E  id + E * E
 id + id * E  id + id * E
 id + id * id  id + id * id
(C) 2014, Prepared by Partha Sarathi Chakraborty

Ambiguity: Two Parse trees


21
22

Eliminating Ambiguity
• “Dangling-else” grammar

• Grammar is ambiguous since the string


if E1then if E2then S1else S2
(C) 2014, Prepared by Partha Sarathi Chakraborty

has two parse trees


23

Eliminating Ambiguity
• The general rule is, “Match each else with the closet previous
unmatched then”.
• The idea is that a statement appearing between a then and an
else must be "matched" ; that is, the interior statement must not
end with an unmatched or open then. A matched statement is
either an if-then-else statement containing no open statements
(C) 2014, Prepared by Partha Sarathi Chakraborty

or it is any other kind of unconditional statement.


• Now, the grammar, rewritten
24

Left Recursion
• Productions of the form
AA|
are left recursive
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Top-down parsing methods cannot handle


left-recursive grammars, so a transformation
that eliminates left recursion is needed.
25

Immediate Left-Recursion
Elimination

Rewrite every left-recursive production


AA|
(C) 2014, Prepared by Partha Sarathi Chakraborty

into a right-recursive production:


A   A’
A’   A’ | 
26

Example
• Consider the grammar
EE+T|T
TT*F|F
F  ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Eliminate immediate left recursion (Non-terminal E and T


having such productions A  A )
E  TE’
E’ +TE’ | 
T  FT’
T’ *FT’ | 
F  ( E ) | id
27

Another Example
• Consider the grammar, but it is not immediately left recursive.
S  Aa | b
A  Ac | Sd | 
• Using general left recursion algorithm
• Substitute S-productions A  Sd to obtain the following
(C) 2014, Prepared by Partha Sarathi Chakraborty

productions
A  Ac | Aad | bd | 
• Now, Eliminate the immediate left recursion among the A-
productions
S  Aa | b
A  bdA’ | A’
A’  cA’ | adA’ | 
28

General Left Recursion


Elimination
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
(C) 2014, Prepared by Partha Sarathi Chakraborty

Ai  Aj 
with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
29

Example Left Rec. Elimination


ABC|a
BCA|Ab Choose arrangement: A, B, C
CAB|CC|a

i = 1: nothing to do
i = 2, j = 1: BCA|Ab
(C) 2014, Prepared by Partha Sarathi Chakraborty

 BCA|BCb|ab
(imm) B  C A BR | a b BR
BR  C b BR | 
i = 3, j = 1: CAB|CC|a
 CBCB|aB|CC|a
i = 3, j = 2: CBCB|aB|CC|a
 C  C A BR C B | a b BR C B | a B | C C | a
(imm) C  a b BR C B CR | a B CR | a CR
CR  A BR C B CR | C CR | 
30

Left Factoring
• When a nonterminal has two or more productions whose right-
hand sides start with the same grammar symbols, the grammar
is not LL(1) and cannot be used for predictive parsing
• If A   1 |  2 |  are productions
• After Left-Factored,
(C) 2014, Prepared by Partha Sarathi Chakraborty

A   A’ | 
A’  1 | 2

• In general, Replace productions


A   1 |  2 | … |  n | 
with
A   A’ | 
A’  1 | 2 | … | n
31

Example
• Consider the grammar
S  iEtS | iEtSeS | a
Eb
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Left factored, this grammar becomes:


S  iEtSS’ | a
S’ eS | 
Eb
32

Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Top-down (C-F grammar with restrictions)


– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (C-F grammar with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
33

Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing
Grammar: Leftmost derivation:
ET+T E lm T + T
(C) 2014, Prepared by Partha Sarathi Chakraborty

T(E) lm id + T
T-E
lm id + id
T  id
E E E E

T T T T T T

+ id + id + id
34

Recursive Descent Parsing


• Grammar must be LL(1)
• Every nonterminal has one (recursive) procedure responsible
for parsing the nonterminal’s syntactic category of input tokens
• When a nonterminal has multiple productions, each production
(C) 2014, Prepared by Partha Sarathi Chakraborty

is implemented in a branch of a selection statement based on


input look-ahead information.
• It may involve backtracking i.e. making repeated scans of the
input.
• It is implemented as a mutual recursive suite of functions
that descend through a parse tree for the string, and as
such are called “recursive descent parsers”.
35

Example
• Consider the grammar
S  cAd
A ab | a
• Steps to build parse tree for string “cad”.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Note: A left-recursive grammar can cause a recursive-decent parser, even


one with backtracking, to go into an infinite loop i.e. try to expand A, it may
eventually find ourselves again trying to expand A without having
consumed any input.
36

It is possible for recursive-decent


parser to loop forever
(C) 2014, Prepared by Partha Sarathi Chakraborty

left-recursive production right-recursive production:


AA| AR
RR|
37

Advantage and Limitations of


recursive-descent parser
• Advantage:
– It is simple to build.
– It can be constructed with the help of parse tree.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Limitations:
– It is not very efficient as compared to other parsing
techniques as there are chances that it may enter in an
infinite loop for some input.
– It is difficult to parse the string if lookahead symbol is
arbitrarily long.
38

Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Compute FIRST and FOLLOW


• Two variants:
– Recursive (recursive calls)
– Non-recursive (table-driven)
39

Transition Diagrams for Predictive Parsers


Consider the
grammar:
E  TE’
E’ +TE’ | 
(C) 2014, Prepared by Partha Sarathi Chakraborty

T  FT’
T’ *FT’ | 
F  ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

Parsers
Transition Diagrams for Predictive
40
(C) 2014, Prepared by Partha Sarathi Chakraborty

Predictive Parsers
Transition Diagrams for
41
42

Non-Recursive Predictive
Parsing
• Given an LL(1) grammar G=(N,T,P,S)
construct a table M[A,a] for A  N, a  T
and use a driver program with a stack
(C) 2014, Prepared by Partha Sarathi Chakraborty

input a + b $

stack
Predictive parsing
X program (driver) output
Y
Z Parsing table
$ M
43

FIRST and FOLLOW


• FIRST: If  is any string of grammar symbols, let
FIRST() be the set of terminals that begin the strings
derived from . If  * , then  is also in
FIRST().
(C) 2014, Prepared by Partha Sarathi Chakraborty

• FOLLOW: it is defined as for nonterminal A i. e.


FOLLOW(A), to be the set of terminals a that can
appear immediately to the right of A in some
sentential form, that is, the set of terminals a such that
there exists a derivation of the form S * Aa for
some  and . If A is start symbol, then $ is in
FOLLOW(A).
44

FIRST
• To compute FIRST(X) for all grammar symbols X, apply
the following rules until no more terminals or  can be
added to any FIRST set.
– If X is terminal, then FIRST(X) is {X}.
– If X   is a production, then add  to FIRST(X).
(C) 2014, Prepared by Partha Sarathi Chakraborty

– If X is nonterminal and XY1Y2Yk is a


production, then place a in FIRST(X) if for some i, a
is in FIRST(Yi), and  is in all of FIRST(Y1), ,
FIRST(Yi–1); that is, Y1  Yi-1 * . If  is in
FIRST(Yj) for all j=1, 2, , k, then add  to
FIRST(X).
45

Example
Consider the grammar
E  TE’
E’ +TE’ | 
T  FT’
(C) 2014, Prepared by Partha Sarathi Chakraborty

T’ *FT’ | 
F  ( E ) | id

After applying FIRST rules over the grammar


FIRST(E) = FIRST(T) = FIRST(F) = { ( , id }
FIRST(E’) = { + ,  }
FIRST(T’) = { * ,  }
46

FOLLOW
• To compute FOLLOW(A) for all nonterminals A, apply
the following rules until nothing can be added to any
FOLLOW set.
(C) 2014, Prepared by Partha Sarathi Chakraborty

– Place $ in FOLLOW(S), where S is the start symbol


and $ is the input right endmarker.
– If there is a production AB, then everything in
FIRST() except for  is placed in FOLLOW(B).
– If there is a production AB, or a production
AB where FIRST() contains  (i.e.  * ),
then everything in FOLLOW(A) is in FOLLOW(B).
47

Example
Consider the grammar
E  TE’
E’ +TE’ | 
T  FT’
(C) 2014, Prepared by Partha Sarathi Chakraborty

T’ *FT’ | 
F  ( E ) | id

After applying FIRST rules over the grammar


FOLLOW(E) = FOLLOW(E’) = { ) , $ }
FOLLOW(T) = FOLLOW(T’) = { + , ) , $ }
FOLLOW(F) = { + , * , ) , $ }
48

Usefulness of FIRST and FOLLOW


• FIRST and FOLLOW, both functions help for the
construction of predictive parser, by fill in the entries
of a predictive parsing table for grammar G,
(C) 2014, Prepared by Partha Sarathi Chakraborty

whenever possible.
• Sets of tokens yield by the FOLLOW function can
also be used as synchronizing tokens during panic-
mode error recovery.
• FIRST and FOLLOW also useful for LR parsing i. e.
for LR(1) items and SLR(1) table.
49

Another Example of FIRST and FOLLOW


• Grammar G
S  ACB | CbA | Ba
A  da | BC
Bg|
(C) 2014, Prepared by Partha Sarathi Chakraborty

Ch|
(C) 2014, Prepared by Partha Sarathi Chakraborty

Predictive Parsing Table


50
51

Construction of Predictive
Parsing Table
Algorithm: Construction of a predictive parsing table.
Input: Grammar G.
Output: Parsing Table M.
Method:
(C) 2014, Prepared by Partha Sarathi Chakraborty

1. For each production A   of the grammar, do steps 2


and 3.
2. For each terminal a in FIRST(), add A   to M[A, a].
3. If  is in FIRST(), add A   to M[A, b] for each
terminal b in FOLLOW(A). If  is in FIRST() and $ is
in FOLLOW(A), add A   to M[A, $].
4. Make each undefined entry of M be error.
52

Predictive Parsing Working


• The program considers X, the symbols on the top of the stack, and a,
the current input symbol. These two symbols determine the parser
action.
• There are three possibilities:
– If X = a = $, the parser halts and announces successful completion of
parsing.
(C) 2014, Prepared by Partha Sarathi Chakraborty

– If X = a  $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
– If X is a nonterminal, the program consults entry M[X, a] of the parsing
table M. This entry will be either an X-production of the grammar or an
error entry. If for example, M[X, a] = {X  UVW}, the parser replaces
X on top of the stack by WVU (with U on top).
• As output, we shall assume that the parser just prints the production
used; any other code could be executed here.
• If M[X, a] = error, the parser calls an error recovery routine.
53
Moves made by the Nonrecursive
predictive parser
(C) 2014, Prepared by Partha Sarathi Chakraborty
54

LL(1) Grammar
• LL(1) means
– The first “L”: scanning the input from left to
right.
(C) 2014, Prepared by Partha Sarathi Chakraborty

– The second “L”: Leftmost derivation


– “1” stands for Using one input symbol of
lookahead at each step to make parsing action
decisions.
55

LL(1) Grammars are ambiguous


• Consider the Grammar
FIRST (S) = {i, a} FOLLOW(S) = { $ , e }
S  iEtSS’ | a FIRST (S’) = {e, } FOLLOW(S’) = { $, e }
S’ eS |  FIRST (E) = { b } FOLLOW(E) = { t }
Eb
• Parsing Table for this grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

• A grammar whose parsing table has no multiply-defined entries is said


to be LL(1).
• What should be done when a parsing table has multiply-defined entries?
56

LL(1) Grammar Properties


• No ambiguous or Left recursive grammar can be LL(1).
• A grammar G is LL(1) iff whenever A   |  are two
distinct productions of G. The following conditions hold:
– For no terminal a,
do both  and  derive strings beginning with a.
(C) 2014, Prepared by Partha Sarathi Chakraborty

i.e. FIRST()  FIRST() = 


– At most one of  and  can derive the empty string.
i.e. if  *  then  *  OR if  *  then  * 
– If  * , then  does not derive any string beginning with a
terminal in FOLLOW(A).
i.e. if  *  then
 * 
FIRST()  FOLLOW(A) = 
57

In General, LL(1) Grammar Properties

• A grammar G is LL(1) if for each collections of


productions
A  1 | 2 | … | n
for nonterminal A the following holds:
(C) 2014, Prepared by Partha Sarathi Chakraborty

1. FIRST(i)  FIRST(j) =  for all i  j


2. if i *  then
2.a. j *  for all i  j
2.b. FIRST(j)  FOLLOW(A) = 
for all i  j
58

Non-LL(1) Examples
Grammar Not LL(1) because
SSa|a Left recursive
SaS|a FIRST(a S)  FIRST(a)  
(C) 2014, Prepared by Partha Sarathi Chakraborty

SaR| For R: S *  and R * 


RS|
SaRa For R:
RS| FIRST(S)  FOLLOW(R)  
S  iEtSS’ | a Parsing table generate multiple
S’ eS |  defined entries.
Eb
59

Error Recovery in Predictive Parsing


• Two condition when error detected in
predictive parsing.
– When the terminal on top of the stack does not
match the next input symbol.
(C) 2014, Prepared by Partha Sarathi Chakraborty

– When nonterminal A is on top of the stack, a is


the next input symbol, and the parsing table entry
M[A, a] is empty.
• Following error recovery method can be used.
– Panic-mode error recovery
– Phrase-level error recovery
60

Error Recovery in Predictive


Parsing: Panic-mode
• It is based on the idea of skipping symbols on
the input until a token in a selected set of
synchronizing tokens appears.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Its effectiveness depends on the choice of


synchronizing set.
• The set should be chosen so that the parser
recovers quickly from errors that are likely to
occur in practice.
61

Error Recovery in Predictive


Parsing: Panic-mode
• Rules
– If the parser looks up entry M[A, a] = blank, then
the input symbol is skipped.
– If the entry is synch, then the nonterminal on top
(C) 2014, Prepared by Partha Sarathi Chakraborty

of the stack is popped in an attempt to resume


parsing OR skip input until FIRST(A) found.
– If a token on top of the stack does not match the
input symbol, then we pop the token from the
stack.
62
Error Recovery in Predictive
Parsing: Panic-mode
• Add synchronizing actions to undefined entries based on
FOLLOW.
• synch: pop A and skip input till synch token OR skip until
FIRST(A) found
(C) 2014, Prepared by Partha Sarathi Chakraborty

Synchronizing tokens added to parsing table.


Error Recovery in Predictive
63

Parsing: Panic-mode
• Erroneous input: ) id * + id
(C) 2014, Prepared by Partha Sarathi Chakraborty
64

Error Recovery in Predictive


Parsing: Phrase-Level
• It is implemented by filling in the blank entries in the
predictive parsing table with pointers to error routines.
• These routines may change, insert, or delete symbols on
the input and issue appropriate error messages.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• They may also pop from the stack.


• In any case, it must be sure that there is no possibility of
an infinite loop.
• Checking that any recovery action eventually results in
an input symbol being consumed (or the stack being
shortened if the end of the input has been reached) . So,
to protect against such loops.
65

Error Recovery in Predictive


Parsing: Phrase-level
• Change input stream by inserting missing *
For example: id id is changed into id * id
Nonterminal

INPUT SYMBOL
(C) 2014, Prepared by Partha Sarathi Chakraborty

id + * ( ) $
E E  T E’ E  T E’ synch synch
E’ E’  + T E’ E’   E’  
T T  F T’ synch T  F T’ synch Synch
T’ insert * T’   T’  * F T’ T’   T’  
F F  id synch synch F(E) synch Synch

insert *: insert missing * and redo the production


66

Error Recovery in Predictive Parsing: Phrase-level


Error Productions
E  T E’ Add error production:
E’  + T E’ |  T’ F T’
T  F T’ to ignore missing *, e.g.: id id
T’  * F T’ | 
F  ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

Nonterminal

INPUT SYMBOL

id + * ( ) $
E E  T E’ E  T E’ synch synch
E’ E’  + T E’ E’   E’  
T T  F T’ synch T  F T’ synch Synch
T’ T’ F T’ T’   T’  * F T’ T’   T’  
F F  id synch synch F(E) synch Synch
67

Error Recovery in Predictive Parsing: Phrase-level


Error Productions
• Erroneous input: id id
(C) 2014, Prepared by Partha Sarathi Chakraborty
Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing :
Shift Reduce Parsing and
Operator Precedence Parsing

Partha Sarathi Chakraborty


Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2

Outline
• Bottom-Up Parsing
– Shift Reduce Parsing
– Operator Precedence Parsing.
– LR parsers: (next Presentation)
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
3

Bottom-Up Parsing
• Start at the leaves and grow toward root.
• We can think of the process as reducing the input
string to the start symbol.
• At each reduction step a particular substring matching
(C) 2014, Prepared by Partha Sarathi Chakraborty

the right-side of a production is replaced by the


symbol on the left-side of the production.
• Bottom-up parsers handle a large class of grammars.
4

Bottom-Up Parsing
• A general style of bottom-up syntax analysis, known as
shift-reduce parsing.
• Main actions are shift and reduce.
• At each shift action, the current symbol in the input
string is pushed to a stack.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• At each reduction step, the symbols at the top of the


stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the
left side of that production.
• There are also two more actions: accept and error.
5

Shift – Reduce Parsing


• “Shift-Reduce” Parsing
• Reduce a string to the start symbol of the grammar.
• At every step a particular sub-string is matched (in left-to-right
fashion) to the right side of some production and then it is
substituted by the non-terminal in the left hand side of the
production.
(C) 2014, Prepared by Partha Sarathi Chakraborty

Consider: Reverse Order


abbcde
S  aABe
aAbcde
A  Abc | b
aAde
Bd
aABe
S
Rightmost Derivation:
S  aABe  aAde  aAbcde  abbcde
6

Shift-Reduce Parsing

Grammar: Reducing a sentence: Shift-reduce corresponds


SaABe abbcde to a rightmost derivation:
AAbc|b aAbcde S rm a A B e
Bd aAde rm a A d e
aABe rm a A b c d e
(C) 2014, Prepared by Partha Sarathi Chakraborty

These match S rm a b b c d e


production’s
right-hand sides
S
A A A
A A A B A B
a b b c d e a b b c d e a b b c d e a b b c d e
7

Handles
A handle is a substring of grammar symbols in a right-
sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
(C) 2014, Prepared by Partha Sarathi Chakraborty

AAbc|b aAde Handle


Bd aABe
S
abbcde
aAbcde NOT a handle, because
aAAe further reductions will fail
…? (result is not a sentential form)
8

Handles
• A handle of a right sentential form  ( ) is a
production rule A   and a position of  where the string 
may be found and replaced by A to produce the previous right-
sentential form in a rightmost derivation of .
S  A  
i.e. A   is a handle of  at the location immediately after
(C) 2014, Prepared by Partha Sarathi Chakraborty

the end of ,
• If the grammar is unambiguous, then every right-sentential
form of the grammar has exactly one handle.
•  is a string of terminals
9

Handle Pruning
• The process of discovering a handle & reducing it to
the appropriate left-hand side is called handle
pruning. Handle pruning forms the basis for a
bottom-up parsing method.

• To construct a rightmost derivation


(C) 2014, Prepared by Partha Sarathi Chakraborty

S = 0  1  2  ...  n-1  n=  input string

Reduction made by Shift-Reduce Parser


10

Shift – Reduce Parser


• There are four possible actions of a shift-parser
action:
– Shift : The next input symbol is shifted onto the top of the
stack.
– Reduce: Replace the handle on the top of the stack by the
(C) 2014, Prepared by Partha Sarathi Chakraborty

non-terminal.
– Accept: Successful completion of parsing.
– Error: Parser discovers a syntax error, and calls an error
recovery routine.
11

Stack Implementation of Shift – Reduce Parser

• Initial State
STACK INPUT
$ W$
• Final State
(C) 2014, Prepared by Partha Sarathi Chakraborty

STACK INPUT
$S $
12

Stack Implementation of
Shift-Reduce Parsing

Stack Input Action


$ id+id*id$ shift
$id +id*id$ reduce E  id How to
Grammar: $E +id*id$ shift
resolve
(C) 2014, Prepared by Partha Sarathi Chakraborty

EE+E $E+ id*id$ shift


$E+id *id$ reduce E  id conflicts?
EE*E $E+E *id$ shift (or reduce?)
E(E) $E+E* id$ shift
E  id $E+E*id $ reduce E  id
$E+E*E $ reduce E  E * E
$E+E $ reduce E  E + E
Find handles $E $ accept
to reduce
13

Justifies the use of stack in shift –


reduce parsing
• The handle will always appear on top of the
stack, never inside.
• Consider the possible forms of two successive
(C) 2014, Prepared by Partha Sarathi Chakraborty

steps in any rightmost derivation.


• These steps can be of the form:
1. S *rm Az *rm Byz *rm yz
2. S *rm BxAz *rm Bxyz *rm xyz
14

Justifies the use of stack in shift –


reduce parsing
• Let consider case (1) in reverse, where a shift –
reduce parser just reached the configuration
STACK INPUT
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ yz$ handle identified


$B yz$ reduce
$By z$ handle identified
$A z$ reduce
15

Justifies the use of stack in shift –


reduce parsing (contd…)
• Case (2) configuration,
STACK INPUT
$ xyz$ handle identified
(C) 2014, Prepared by Partha Sarathi Chakraborty

$B xyz$ reduce

$Bx yz$ shift

$Bxy z$ handle identified

$BxA z$ reduce
Note: It never had to go into the stack to find the handle. It is this
aspect of handle pruning that makes a stack a particularly
convenient data structure to implementing a shift reduce parser.
16

Conflicts
• Shift-reduce and reduce-reduce conflicts are
caused by
– The limitations of the LR parsing method (even
when the grammar is unambiguous)
– Ambiguity of the grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty
17

Shift-Reduce Parsing:
Shift-Reduce Conflicts

Stack Input Action


$… …$ …
$…if E then S else…$ shift or reduce?
Ambiguous grammar:
(C) 2014, Prepared by Partha Sarathi Chakraborty

S  if E then S
| if E then S else S
| other

Resolve in favor
of shift, so else
matches closest if
18

Shift-Reduce Parsing:
Reduce-Reduce Conflicts

Stack Input Action


$ aa$ shift
$a a$ reduce A  a or B  a ?
Grammar:
(C) 2014, Prepared by Partha Sarathi Chakraborty

CAB
Aa
Ba

Resolve in favor
of reduce A  a,
otherwise we’re stuck!
19

Operator – Precedence Parsing


• Operator grammar is a small, but an important class of
grammars.
• It easily construct an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
• Properties: In an operator grammar, no production rule
(C) 2014, Prepared by Partha Sarathi Chakraborty

can have:
–  at the right side
– two adjacent non-terminals at the right side.
• Example
E → E + E | E * E | ( E ) | −E | id
20

Operator – Precedence Parsing


• Disadvantage
– Hard to handle tokens like the minus sign, which
has two different precedences (depending on
whether it is unary or binary)
– Worse, since the relationship between a grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

for the language being parsed and the operator-


precedence parser itself is tenuous, one cannot
always be sure the parser accepts exactly the
desired language.
– Only a small class of grammars can be parsed.
21

Operator – Precedence Parsing


• In operator-precedence parsing, we define three
disjoint precedence relations between certain pairs
of terminals.
• Relations Meaning
a⋖b b has higher precedence than a
(C) 2014, Prepared by Partha Sarathi Chakraborty

a≐b b has same precedence as a


a⋗b b has lower precedence than a

• There are two common ways of determining what


precedence relations should hold between a pair of
terminals.
22

Operator – Precedence Parsing


• Two Common ways -
– First method is intuitive and is based on the traditional
notions of associativity and precedence of operators.
(Unary minus causes a problem).
• Example: If * has higher precedence than + than the
relationship shown as + ⋖ * and * ⋗ +
(C) 2014, Prepared by Partha Sarathi Chakraborty

– Second method of selecting operator-precedence


relationship is first to construct an unambiguous grammar
for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.
• Example: the dangling else grammar.
23

Using Operator – Precedence Relations


• The intention of the precedence relations is to
find the handle of a right-sentential form,
⋖ with marking the left end,
≐ appearing in the interior of the handle, and
(C) 2014, Prepared by Partha Sarathi Chakraborty

⋗ marking the right hand.


• In our input string $0a11a2... n-1ann$, we
insert the precedence relation between the
pairs of terminals (the precedence relation
holds between the terminals in that pair).
24

Using Operator – Precedence Relations


• Consider the grammar id + * $
E → E + E | E * E | id id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
• Operator – Precedence Relations
* ⋖ ⋗ ⋗ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ ⋖ ⋖ ⋖

• Then the input string id + id * id with the precedence


relations inserted will be:
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $
25

Using Operator – Precedence


Relations: To find handle
• The handle can be found by the following process:
1. Scan the string from left end until the first ⋗ is encountered.
2. Then scan backwards (to the left) over any ≐ until a ⋖ is
encountered.
3. The handle contains everything to left of the first ⋗ and to the
right of the ⋖ is encountered in step (2), including any
(C) 2014, Prepared by Partha Sarathi Chakraborty

intervening or surrounding non-terminals.

$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ id + id * id $
$ ⋖ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ E + id * id $
$ ⋖ + ⋖ * ⋖ id .⋗ $ E  id $ E + E * id $
$⋖ +⋖ *⋗ $ EE*E $E+ E*E$
$⋖ +⋗ $ EE+E $E+E$
$$ $E$
26

Using Operator – Precedence


Relations: LEADING and TRAILING
• Consider the grammar
EE+T|T
TT*F|F
F  ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

• LEADING(A) and TRAILING(A) for each


nonterminal A, defined by
– LEADING(A) = { a | A + a, where  is  or a single
nonterminal }
– TRAILING(A) = { a | A + a, where  is  or a single
nonterminal }
27

LEADING and TRAILING Algorithm


• LEADING(A) Algorithm
– a is in LEADING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in LEADING(B), and there is a production of the
form A → B, then a is in LEADING(A).
(C) 2014, Prepared by Partha Sarathi Chakraborty

• TRAILING(A) Algorithm
– a is in TRAILING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in TRAILING(B), and there is a production of the
form A → B, then a is in TRAILING(A).
28

LEADING and TRAILING


• The LEADING and TRAILING terminals for the
previous grammar.
• NONTERMINAL LEADING TRAILING
E * , + , ( , id * , + , ) , id
(C) 2014, Prepared by Partha Sarathi Chakraborty

T * , ( , id * , ) , id
F ( , id ) , id
29

Computing Operator - Precedence Relations


Input: An operator grammar G.
Output: The relations ⋖ , ≐ , and ⋗ for G.
Method:
1.Compute LEADING(A) and TRAILING(A) for each
nonterminal A.
(C) 2014, Prepared by Partha Sarathi Chakraborty

2.Execute the algorithm, examining each position of the right


side of each production.
3.Set $ ⋖ a for all a in LEADING(S) and set b ⋗ $ for all b
in TRAILING(S), where S is the start symbol of G.
for each production A → X1X2…Xn do
for i ≔ 1 to n – 1 do
30

Computing Operator - Precedence Relations


(Contd…)
begin
if Xi and Xi+1 are both terminals then set Xi ≐ Xi+1 ;
if i ≤ (n − 2) and Xi and Xi+2 are terminals
and Xi+1 is a nonterminal then
(C) 2014, Prepared by Partha Sarathi Chakraborty

set Xi ≐ Xi+2 ;
if Xi is a terminal and Xi+1 is a nonterminal then
for all a in LEADING(Xi+1) do set Xi ⋖ a ;
if Xi is a nonterminal and Xi+1 is a terminal then
for all a in TRAILNG(Xi) do set a ⋗ Xi+1 ;
end
31

Operator – Precedence Relations


The grammar
EE+T|T $⋖a
TT*F|F
F  ( E ) | id
right part or g()
+ * ( ) id $
left part or f()

EE+T
(C) 2014, Prepared by Partha Sarathi Chakraborty

+ ⋗ ⋖ ⋖ ⋗ ⋖ ⋗
* ⋗ ⋗ ⋖ ⋗ ⋖ ⋗
Xi Xi+1 Xi+2 ( ⋖ ⋖ ⋖ ≐ ⋖
Xi Xi+1 ) ⋗ ⋗ ⋗ ⋗
id ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖
32

Operator – Precedence Parsing Algorithm


Input: An input string w and a table of precedence relations.

Output: If w is well formed, a skeletal parse tree, with a


placeholder nonterminal E labeling all interior nodes; otherwise,
an error indication.

Method: Initially, the stack contains $ and the input buffer the
(C) 2014, Prepared by Partha Sarathi Chakraborty

string w$.

set ip to point to the first symbol of w$ ;


repeat forever
if $ is on top of the stack and ip points to $ then
accept and return
else
33

Operator – Precedence Parsing Algorithm


(Contd…)
begin
let a be the topmost terminal symbol on the stack
and let b be the current symbol pointed to by ip ;
if a ⋖ b or a ≐ b then /* shift */
begin
shift/push b onto the stack ;
(C) 2014, Prepared by Partha Sarathi Chakraborty

advance ip to the next input symbol;


end ;
else if a ⋗ b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by ⋖
to the terminal most recently popped
else call error recovery routine or error()
end
34

Moves made by Operator – Precedence


Parsing
STACK Precedence INPUT ACTION
$ ⋖ id + id * id$ shift
id + * $
$ id ⋗ + id * id$ reduce E  id
$ ⋖ + id * id$ shift id ⋗ ⋗ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

$+ ⋖ id * id$ shift + ⋖ ⋗ ⋖ ⋗
$ + id ⋗ * id$ reduce E  id * ⋖ ⋗ ⋗ ⋗
$+ ⋖ * id$ shift
$ ⋖ ⋖ ⋖
$+* ⋖ id$ shift
$ + * id ⋗ $ reduce E  id
$+* ⋗ $ reduce E  E * E
$+ ⋗ $ reduce E  E + E
$ $ Accept
35

Actions of Operator – Precedence Parsing


input string: id + id
Stack Input Stack Input
$ id + id $ $ id + id $

id
(a) (b)
(C) 2014, Prepared by Partha Sarathi Chakraborty

$● + id $ $●+ id $

● ● +

id id
(c) (d)
Actions of Operator – Precedence Parsing
36

input string: id + id

Stack Input Stack Input


$ ● + id $ $●+● $

● + id ● + ●

id (e) id id (f)
(C) 2014, Prepared by Partha Sarathi Chakraborty

Stack Input
$● $

● + ●

id id (g)
37

Operator-Precedence Relations from


Associativity and Precedence
Consider the grammar for the arithmetic expression
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
Note: Grammar is ambiguous, and right-sentential
forms could have many handles.
(C) 2014, Prepared by Partha Sarathi Chakraborty

1. If operator 1 has higher precedence than operator 2,


make
1 ⋗ 2 and 2 ⋖ 1
Example: * and +, input: E + E * E + E
38
Operator-Precedence Relations from
Associativity and Precedence (Contd…)
2. If operator 1 and operator 2 have equal precedence,
they are left-associative, make 1 ⋗ 2 and 2 ⋗ 1
they are right-associative, make 1 ⋖ 2 and 2 ⋖ 1
Example: left-associative input: E  E + E
+ ⋗ + , + ⋗  ,  ⋗  , and  ⋗ +
right-associative ↑ ⋖ ↑
(C) 2014, Prepared by Partha Sarathi Chakraborty

3. For all operators ,


 ⋖ id , id ⋗  ,  ⋖ ( , ( ⋖  ,
( ⋖  ,  ⋗ ) , ) ⋗  ,  ⋗ $, and $ ⋖ 
Also, let
(≐ ) $ ⋖( $ ⋖ id
(⋖ ( id ⋗ $ ) ⋗$
( ⋖ id id ⋗ ) ) ⋗)
39

Operator-Precedence Relations from


Associativity and Precedence (Contd…)
Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
Assuming,
• ↑ is of highest precedence and right associative.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• * and / are of next highest precedence and left –


associative, and
• + and – are of lowest precedence and left –
associative
Input: id * ( id ↑ id ) – id / id
Try input string with the table in next slide.
40

Operator-Precedence Relations from


Associativity and Precedence (Contd…)
+ − * / ↑ id )( $
+ ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
− ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
* ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
/ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

↑ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
id ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
( ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ≐ error
) ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖
41

Handling Unary Operators


• Operator-Precedence parsing cannot handle the unary minus
when we also have the binary minus in our grammar.
• The best approach to solve this problem, let the lexical
analyzer handle this problem.
– The lexical analyzer will return two different tokens for the unary minus
and the binary minus.
– The lexical analyzer will need a lookhead to distinguish the binary
(C) 2014, Prepared by Partha Sarathi Chakraborty

minus from the unary minus.


• Then, we make
 ⋖ unary-minus for any operator
unary-minus ⋗  if unary-minus has higher precedence than 
unary-minus ⋖  if unary-minus has lower (or equal)
precedence than 
42

Precedence Functions
• Compilers using operator precedence parsers do not need to
store the table of precedence relations.
• The table can be encoded by two precedence functions f and g
that map terminal symbols to integers.
• For symbols a and b.
f(a) < g(b) whenever a ⋖ b
(C) 2014, Prepared by Partha Sarathi Chakraborty

f(a) = g(b) whenever a ≐ b


f(a) > g(b) whenever a ⋗ b
• The precedence relation between a and b can be determined by
a numerical comparison between f(a) and g(b).
• Note: Error entries in the precedence matrix are obscured,
since one of (1), (2), or (3) holds no matter what f(a) and g(b)
are.
43

Precedence Functions
Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id

+ − * / ↑ ( ) id $
f 2 2 4 4 4 0 6 6 0
(C) 2014, Prepared by Partha Sarathi Chakraborty

g 1 1 3 3 5 5 0 5 0
Precedence Functions
For example:
* ⋖ id, and f(*) < g(id)
Note: f(id) > g(id) suggests that id ⋗ id;
In fact no precedence relation holds between id and id.
44

Constructing Precedence Functions


Input: An operator precedence matrix
Output: Precedence function representing the input matrix,
or an indication the none exist.

Method
(C) 2014, Prepared by Partha Sarathi Chakraborty

1.Create symbols fa and ga for each a that is a terminal or $.


2.Partition the created symbols into as many groups as
possible, in such a way that if a ≐ b, then fa and gb are in the
same group.
45

Constructing Precedence Functions


3. Create a directed graph whose nodes are the groups
found in (2). For any a and b,
• If a ⋖ b, place an edge from the group of gb to the group of fa.
• If a ⋗ b, place an edge from the group of fa to that of gb.

4. If the graph constructed in (3) has a cycle, then no


(C) 2014, Prepared by Partha Sarathi Chakraborty

precedence functions exist. If there are no cycle, let


f(a) be the length of the longest path beginning at the
group of fa; let g(a) be the length of the longest path
beginning at the group of ga.
46
Example

+ +
(C) 2014, Prepared by Partha Sarathi Chakraborty

Graph representing precedence functions

+ * id $
f 2 4 4 0
g 1 3 5 0
47

Error Recovery in Operator-


Precedence Parsing
• Parser can discover syntactic errors:
– If no precedence relation holds between the
terminal on top of the stack and the current
(C) 2014, Prepared by Partha Sarathi Chakraborty

input.
– If a handle has been found, but there is no
production with this handle as a right side.
48

Handling Errors during Reductions


Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
• The error checker for reductions need only check that the proper
set of nonterminal makers appears among the terminal strings
being reduced.
• Specifically, the checker does the following:
– If + , − , * , / , or ↑ is reduced, it checks that nonterminals appear on both
(C) 2014, Prepared by Partha Sarathi Chakraborty

sides. If not, it issues the diagnostic


missing operand
– If id is reduced, it checks that there is no terminal to the right or left. If
there is, it can warn
missing operator
– If ( ) is reduced, it checks that there is a nonterminal between the
parentheses. If not, it can say
no expression between parentheses
id ( ) $ 49

id e3 e3 ⋗ ⋗

⋖ ⋖ ≐
Handling Shift/Reduce Errors: ( e4

) e3 e3 ⋗ ⋗
Error handling routines $ ⋖ ⋖ e2 e1

• e1: /* called when whole expression is missing */


– insert id onto the input
– issue diagnostic: “missing operand”
• e2: /* called when expression begins with a right parenthesis */
(C) 2014, Prepared by Partha Sarathi Chakraborty

– delete ) from the input


– issue diagnostic: “unbalanced right parenthesis”
• e3: /* called when id or ) is followed by id or ( */
– insert + onto the input
– issue diagnostic: “missing operator”
• e4: /* called when expression ends with a left parenthesis */
– pop ( from stack
– issue diagnostic: “missing right parenthesis”
50
Error Handling Mechanism
Erroneous input: id id ) ( $ Original String: ( id + id )

STACK Precedence INPUT ACTION


$ ⋖ id id )($ shift
$ id blank id )($ error, e3. missing operator
$ id + id )($ insert ‘+’ in INPUT
$ id ⋗ + id )($ reduce
$ ⋖ + id )($ shift
$+ ⋖ id )($ shift
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ + id ⋗ )($ reduce
$+ ⋗ )($ reduce
$ blank )($ error, e2. unbalanced right parenthesis
$ ($ delete ‘)’from the INPUT
$ ⋖ ($ shift
$( blank $ error, e4. missing right parenthesis
$ $ pop ‘(’from STACK
$ $ accept
1

Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing : LR parsers
(C) 2014, Prepared by Partha Sarathi Chakraborty

Partha Sarathi Chakraborty


Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2

Outline
• Bottom-Up Parsing
– LR parsers:
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
(C) 2014, Prepared by Partha Sarathi Chakraborty
3

LR Parsers
• Efficient bottom-up syntax analysis technique
that can used to parse a large class of CFG.
• The technique is called LR(k) parsing; ‘L’ is
for Left-to-Right scanning of the input, ‘R’ for
(C) 2014, Prepared by Partha Sarathi Chakraborty

constructing a rightmost derivation in reverse,


and the k for the number of input symbols of
lookahead that are used in making parsing
decisions.
• When (k) is omitted, k is assumed to be 1.
4

LR Parsers: Attractive
• LR parsing is attractive for variety of reasons.
– LR parsers can be constructed to recognize virtually all
programming language constructs for which context-free
grammars can be written.
– The LR-parsing method is the most general nonbacktracking
shift-reduce parsing method known, yet it can be
(C) 2014, Prepared by Partha Sarathi Chakraborty

implemented as efficiently as other, more primitive shift-


reduce methods.
– An LR parser can detect a syntactic error as soon as it is
possible to do so on a left-to-right scan of the input.
– The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive or LL methods.
5

LR Parsers: Drawback
• The principal drawback of the LR method is that it is
too much work to construct an LR parser by hand for a
typical programming-language grammar.
• A specialized tool, an LR parser generator, is needed.
• YACC: Such a generator takes a context-free grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

and automatically produces a parser for that grammar.


– If the grammar contains ambiguities or other constructs that
are difficult to parse in a left-to-right scan of the input, then
the parser generator locates these constructs and provides
detailed diagnostic messages.
6

Three Techniques for creating LR


Parsing Table
• Simple LR (in short SLR) – It is easiest to
implement, but least powerful as It may fail to
produce a paring table for certain grammars.
• Canonical LR – It is most powerful and most
(C) 2014, Prepared by Partha Sarathi Chakraborty

expensive.
• Lookahead LR (in short LALR) – It is intermediate
in power and cost between other two. It will work on
most programming-language grammars, and with
some effort, implemented efficiently.
• Powerful: Canonical LR > LALR > SLR
7

LR(0) Items of a Grammar


• An LR(0) item of a grammar G is a production of G
with a • at some position of the right-hand side
• Thus, a production
AXYZ
has four items:
[A  • X Y Z]
(C) 2014, Prepared by Partha Sarathi Chakraborty

[A  X • Y Z]
[A  X Y • Z]
[A  X Y Z •]
• Note that production A   has one item [A  •]
8

Constructing the set of LR(0) Items of


a Grammar
• If G is a grammar with start symbol S, then G’, the
augmented grammar for G i.e. G with a new start
symbol S’ and production S’S.
The purpose of new starting production is to indicate to the parser when it should
stop parsing and announce acceptance of the input i.e. acceptance occurs when
(C) 2014, Prepared by Partha Sarathi Chakraborty

and only when the parser is about to reduce by S’ S.

• The Closure Operation


9
Example
• Consider the Grammar G,

F
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id}
FOLLOW(E) = { $ , ) , + }
FOLLOW(T) = { * , $ , ) , + }
FOLLOW(F) = { * , $ , ) , + }
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Create augmented expression grammar G’

F
• If I is the set of one item {[E’ → . E}, then closure(I) contains
the items.
10

The Closure Operation (Example)


(C) 2014, Prepared by Partha Sarathi Chakraborty

Final Closure I0
(C) 2014, Prepared by Partha Sarathi Chakraborty

Items
The Goto Operation for LR(0)
11
12

The Sets-of-Items Construction


(C) 2014, Prepared by Partha Sarathi Chakraborty

Kernel and Nonkernel Items


13
LR(0) collection for Grammar
id
I0  I5

+
I1  I6

E *
I0  I1 I2  I7

T E F
I4  I8 I4  I3
(C) 2014, Prepared by Partha Sarathi Chakraborty

I0  I2
( id
I4  I4 I4  I5
F
I0  I3 T F
I6  I9 I6  I3
( id
I6  I4 I6  I5
(
I0  I4 F ( id
I7  I10 I7  I4 I7  I5
)
I8  I11 *
+ I9  I7
I8  I6
14
Transition diagram for the grammar G
represent Goto operation
(C) 2014, Prepared by Partha Sarathi Chakraborty

Transition diagram of DFA D for viable prefixes.


(C) 2014, Prepared by Partha Sarathi Chakraborty

Constructing SLR parser table


15
(C) 2014, Prepared by Partha Sarathi Chakraborty

expression grammar G
Parsing table SLR(1) for
16
17

Parse table Action and Goto function


• Parsing table shows the parsing Action and Goto functions for the
following grammar G, repeated here with the production numbered.
(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Model of an LR Parser
18
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR Parsing Algorithm
19
20

Example of LR parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty

Moves of LR parser on id * id + id
21

Another Example – SLR parsing


• Consider the following grammar G
(1) S  S ( S )
(2) S  
• Augmented Grammar G’
S’  S
(C) 2014, Prepared by Partha Sarathi Chakraborty

S S(S)
S 
• FIRST(S) = { ( , }
• FOLLOW(S) = { $ , ( , ) }
(C) 2014, Prepared by Partha Sarathi Chakraborty

Closure Set I
Construction of LR(0) items:
22
23

SLR(1) Parsing Table and Parsing

Now, the moves of LR parser on parsing the input string ( ) ( )


(C) 2014, Prepared by Partha Sarathi Chakraborty
24

LL vs. LR Grammars
• For a grammar to be LR(k), we must be able to
recognize the occurrence of the right side of a
production in a right-sentential form, with k input
symbols of lookahead.
• This requirement is far less stringent than that for
(C) 2014, Prepared by Partha Sarathi Chakraborty

LL(k) grammars where we must be able to recognize


the use of a production seeing only the first k symbols
of what its right side derives.
• Thus, it should not be surprising that LR grammars
can describe more languages than LL grammars.
25

Unambiguous grammars that are


not SLR(l)
• Every SLR(1) grammar is unambiguous, but
there are many unambiguous grammars that
are not SLR(1).
• Consider the grammar X with productions:
(C) 2014, Prepared by Partha Sarathi Chakraborty

FIRST (S) = FIRST(L) / FIRST (R) = { * , id}


FOLLOW(S) = {$} , FOLLOW(L) = { = , $}
FOLLOW(R) = {$ , = }
26

Canonical LR(0) collection for


Grammar X id
I0  I5

=
I2  I6

S R *
I0  I1 I4  I7 I4  I4
(C) 2014, Prepared by Partha Sarathi Chakraborty

L id
I4  I8 I4  I5
L
I0  I2
R
I6  I9
R
I0  I3 L
I6  I8
*
I0  I4 *
I6  I4
id
I6  I5
27
Parsing table SLR(1) for expression
grammar X shows S-R conflict

Conflict
Grammar X,
(1) S  L = R
(C) 2014, Prepared by Partha Sarathi Chakraborty

(2) S  R
(3) L  * R
(4) L  id
(5) R  L

Grammar X is not ambiguous. This shift/reduce conflict arises from the fact that the
SLR parser construction method is not powerful enough to remember enough left
context to decide what action the parser should take on input =, having seen a string
reducible to L.
28

LR(1) Grammars

• SLR too simple


• LR(1) parsing uses lookahead to avoid
unnecessary conflicts in parsing table
• LR(1) item = LR(0) item + lookahead
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(0) item: LR(1) item:


[A•] [A•, a]
29

SLR Versus LR(1)


• Split the SLR states by
adding LR(1) lookahead
• Unambiguous grammar
SL=R|R
L  * R | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

RL
30

LR(1) Items
• An LR(1) item
[A•, a]
contains a lookahead terminal a, meaning  already
on top of the stack, expect to see a
• For items of the form
[A•, a]
(C) 2014, Prepared by Partha Sarathi Chakraborty

the lookahead a is used to reduce A only if the


next input is a
• For items of the form
[A•, a]
with  the lookahead has no effect
(C) 2014, Prepared by Partha Sarathi Chakraborty

Constructing LR( l ) Sets of Items


31
32

Constructing LR( l ) Sets of Items


• Unambiguous LR(1) grammar:
(1) S  L = R
(2) S  R
(3) L  * R
(4) L  id
(5) R  L

• Augment with S’  S
(C) 2014, Prepared by Partha Sarathi Chakraborty

• LR(1) items (next slide) lookahead / Second Component

A•B , a
Core / First Component

FIRST(a)
33

Construct Closure I0
Core lookahead

FIRST(a)
I0 rewrite as:
(C) 2014, Prepared by Partha Sarathi Chakraborty
34

Constructing LR( l ) Sets of Items for grammar X


(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Construction of canonical-LR parsing tables.


35
(C) 2014, Prepared by Partha Sarathi Chakraborty

grammar X
Canonical LR(1) parsing table for
36
37

LALR(1) Grammars
• LR(1) parsing tables have many states
• LALR(1) parsing (Look-Ahead LR) combines LR(1)
states to reduce table size
• Less powerful than LR(1)
– Will not introduce shift-reduce conflicts, because shifts do
(C) 2014, Prepared by Partha Sarathi Chakraborty

not use lookaheads


– May introduce reduce-reduce conflicts, but seldom do so
for grammars of programming languages
38

Merging of states with common core may produce a


reduce-reduce conflict, but not shift-reduce conflict
(C) 2014, Prepared by Partha Sarathi Chakraborty
39

Constructing LALR(1) Parsing


Tables
1. Construct sets of LR(1) items
2. Combine LR(1) sets with sets of items that share
the same first component or Core, I4 = I11
I4: [L  *•R, =]
[R  •L, =] Shorthand
(C) 2014, Prepared by Partha Sarathi Chakraborty

[L  •*R, =] for two items


[L  *•R, =/$]
[L  •id, =]
[R  •L, =/$] in the same set
[L  •*R, =/$]
I11: [L  *•R, $]
[L  •id, =/$]
[R  •L, $]
[L  •*R, $]
[L  •id, $]
Similarly, following closures share same core or first component.
I5 = I12 , I7 = I13 , I8 = I10
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR parsing table construction


40
(C) 2014, Prepared by Partha Sarathi Chakraborty

grammar X
LALR(1) parsing table for
41
42

Analysis
• The LR and LALR parsers will mimic one
another on correct inputs.
• When presented with erroneous input, the
LALR parser may proceed to do some
(C) 2014, Prepared by Partha Sarathi Chakraborty

reductions after the LR parser has declared an


error. However, the LALR parser will never
shift another symbol after the LR parser
declares an error.
43

Another Example – canonical LR


and LALR
• Consider the following grammar G
(1) S  S ( S )
(2) S  
• Augmented Grammar G’
(C) 2014, Prepared by Partha Sarathi Chakraborty

S’  S
S S(S)
S 
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(1) items for grammar G


44
(C) 2014, Prepared by Partha Sarathi Chakraborty

Canonical LR parsing table


45
46

LALR(1) Parsing Table


• Following closures share same core or first
component of LR(1) items. I4 = I7 , I3 = I6 , I2 = I5
• Therefore, Union of States results
I4  I7 , I3  I6 , I2  I5
(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Parsing canonical LR
47
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR parsing
48
49

Another Example – canonical LR


and LALR
• Consider the following grammar G
(1) S  CC
(2) C  cC FIRST(S) = FIRST(C)
(3) C  d ={c,d}
FOLLOW(S) = { $ }
• Augmented Grammar G’
(C) 2014, Prepared by Partha Sarathi Chakraborty

FOLLOW(S) = { $ , c , d }
S’  S
S  CC
C  cC
Cd
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(1) items for grammar G


50
(C) 2014, Prepared by Partha Sarathi Chakraborty

Canonical LR parsing table


51
52

LALR(1) Parsing Table


• Following closures share same core or first component of
LR(1) items. I3 = I6 , I4 = I7 , I8 = I9
• Therefore, Union of States results
I3  I6 , I4  I7 , I8  I9
(C) 2014, Prepared by Partha Sarathi Chakraborty

I36: C  cC , c | d | $
C  cC , c | d | $
C  d , c | d | $

I47: C  d , c | d | $

I89: C  cC , c | d | $
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR(1) Parsing Table


53
54

LL, SLR, LR, LALR Summary


• LL parse tables computed using FIRST/FOLLOW
– Nonterminals  terminals  productions
– Computed using FIRST/FOLLOW
• LR parsing tables computed using closure/goto
– LR states  terminals  shift/reduce actions
(C) 2014, Prepared by Partha Sarathi Chakraborty

– LR states  terminals  goto state transitions


• A grammar is
– LL(1) if its LL(1) parse table has no conflicts
– SLR if its SLR parse table has no conflicts
– LALR(1) if its LALR(1) parse table has no conflicts
– LR(1) if its LR(1) parse table has no conflicts
(C) 2014, Prepared by Partha Sarathi Chakraborty

LL, SLR, LR, LALR Grammars


55
56

Assignment
Consider the following grammar G,
EE+T|T
T  TF | F
F  F* | a | b
(C) 2014, Prepared by Partha Sarathi Chakraborty

i. Construct the sets of LR(0) items.


ii. Construct the SLR(1) parsing table for this grammar.
iii. Construct the sets of LR(1) items.
iv. Construct Canonical LR(1) parsing table.
v. Construct the LALR parsing table.

You might also like