You are on page 1of 58

UNIT 2

CHAPTER 5
BASIC PARSING TECHNIQUES
5.1 Parser
A parser for grammar G is, a program which takes a input string w and produces a parse tree as
output if w is a sentence of G. otherwise it produces an error
Types of parser:

 Bottom up parser Build parse trees from bottom to top ( leaves to root )
( Shift Reduce parsing, Operator Precedence Parsing)

 Top down parser Build parse trees from top to bottom

( Predictive parsing, Recursive descendent Parsing)


In both the cases , the parser is scanned from left to right, one symbol at a time

Shift Reduce Parsing:


Shift reduce parsing is a bottom up passing. It shifts the input symbols onto the stack,
until the right side of the production appears on the top. Then the right side may be replaced by a
symbol on the left side of the production and the process is repeated.
 Operator precedence parser and LR parser are examples of shift reduce parsing.
 Operator precedence parsing is suitable for parsing expressions (uses information about
the precedence and associativity of operators)
Recursive descendent Parsing :
Recursive descendent parsing is a top down parsing. It uses a collection of recursive
routines to perform parsing
 Predictive Parser is an example of Recursive descendent parsing
 LL parser is a type of Predictive parser
Representation of parse trees:
--2 types of representation
 Implicit
 Explicit
--Sequence of Productions used in derivation is an example of implicit representation
--Linked list structure is an example of explicit representation
Derivation

Left most derivation Right most derivation

Left most non terminal Right most non terminal is replaced at every step
is replaced at every
Also called as Canonical derivation
step

Consider the grammar

Parse tree T

Construct a LMD for the sentence w= i b t i b t a e a


LMD:
SiCtS
ibtS
ibtiCtSeS
ibtibtSeS
ibtibtaeS
ibtibtaea
RMD:
SiCtS
iCtiCtSeS
iCtiCtSea
iCtiCtaea
iCtibtaea
ibtibtaea
Constructing a LMD
a)

b)

c)

d)
5.2 Shift reduce Parsing
Shift reduce parsing is an example of Bottom up Parsing
It constructs the parse trees starting from the leaves, and working up towards the root
Reduction :
 Look for the substring that match the right side of some production
 Replace it by a symbol on the left
 {Replacement of right side of a production by its left side is called Reduction}
Consider the grammar
SaAcBe
AAb|b
Bd
and the string abbcde
We want to reduce the string to S
Given string is abbcde
From the grammar, abbcde , it is noted that
Ab : aAbcde
AAb : aAcde
Bd : aAcBe
SaAcBe : S
Handle:
It is a substring which matches the right side of the production, such that replacement of
substring by a production on the left side leads to start symbol
Handle Pruning:
Removing the handle by replacing the left side of the production
RMD in reverse is obtained by handle pruning
The string appearing to the right side of a handle contains only terminals
Consider the grammar
EE+E
EE*E
E(E)
Eid
Consider the RMD
EE+E
E+E*E
E+E*id3
E+id2*id3
id1+id2*id3
id1handle
-----

Input: id+id*id
Consider the grammar EE+E, EE*E, E(E), Eid
and input string id1+id2*id3
consider the sequence of reductions that leads to the start symbol E
E+id2*id3
E+E*id3
E+E*E
E+E
E
Sequence of right sentential form is the reverse of RMD
SR parsing Operations
A shift reduce parser does 4 operations
Shift
The next input symbol is shifted to the top of the stack
Reduce
The parser identifies the handle at the top of the stack
It compares the handle to the right side of the production
When there is a match, the handle is replaced by the left side of the production
Accept
It indicates the successful completion of parsing
Error
Indicates that some error has occurred, and calls an error recovery routine
 Use a stack and a input buffer.
 $ symbol is used to mark the bottom of the stack and right most end of the input
Stack bottom Input
$ w$
 The parser operates by shifting 0 or more input symbols to the stack, until a handle is on
the top of stack.
 The parser then reduces the handle to the left side of the production
 This process is repeated until
the stack contains the start symbol and the input is empty ( or )
an error is detected
Stack bottom Input
$S $
Input string id1+id2*id3
Grammar: EE+E | E*E | -(E) | id
Shift Reduce parsing actions
Stack input Action
$ id1+id2*id3$ Shift
$id1 +id2*id3$ Reduce by Eid
$E +id2*id3$ Shift
$E+ id2*id3$ Shift
$E+ id2 *id3$ Reduce by Eid
$E+ E *id3$ Shift
$E+ E* id3$ Shift
$E+ E*id3 $ Reduce by Eid
$E+ E*E $ Reduce by EE*E
$E+ E $ Reduce by EE+E
$E $ Accept
Constructing parse tree
1. When we shift an input symbol a on to the stack, we create one node of the tree labelled
a. Both the root and the yield of the tree are a
2. When we reduce X1X2X3…..Xn, to A, we create a new node labelled A Its children are
the root of X1,X2, …Xn
3. For each symbol on the stack, associate a pointer to a tree whose root is that symbol and
children are string of terminals which have been reduced.
4. At the end , the start symbol will have the entire parse tree associated with it
Parse tree construction:

a) After shifting id1

b) After reducing id1 to E

c) After reducing id1+id2*id3 to E+E

d) After completion
5.3 Operator precedence parsing:
Operator grammar properties:
 No production have € on the right side
 No production have 2 adjacent non terminals

Eg:
EEAE | (E) | -E | id
A + | - | *
Is not an operator grammar, because EAE is on the RHS ( 3 adjacent non terminals)

EE+E | E-E | E*E | (E) | -E |id is an operator grammar


Adv:
 Easy to implement
Disadv:
 hard to handle tokens like - minus sign. - sign has2 different precedence depending
on binary or unary
 Only a small class of grammar can be parsed using operator precedence techniques
Three disjoint precedence relations - used in Operator Precedence parsing

These relations help in the selection of handles


 a <. b , a yields precedence to b b – high priority
 a .> b, a takes precedence over b a– high priority
 a ≐ b , a and b has equal priority
2 ways of determining precedence relation
 Associativity and precedence rules of operators
( * has higher precedence than +)
* .> + or + <. *
This method resolves the ambiguity of grammar.
 Operator precedence relations
Operator precedence using Associativity and precedence rule:
1. If operator Ѳ1 has higher precedence than Ѳ2, then Ѳ1.> Ѳ2and Ѳ2 <. Ѳ1
Eg: *+ * has higher precedence than +
* .> + and + <. *
E +E * E +E ; E*E will be the handle and it was reduced first
2. If Ѳ 1 and Ѳ2 have equal precedence,
 Ѳ1.> Ѳ2 and Ѳ2.> Ѳ1, if operators are left associative
 Ѳ1<. Ѳ2 and Ѳ2<.Ѳ1 if the operators are right associative
Eg: (i) + and – are left associative and have equal precedence
+ .> +
+ .> -
- .> +
- .> -
E-E+E : first reduce E-E
E+E-E : first reduce E+E
(ii) ^ is right associative
^ <. ^
E^E^E : reduce E^E first
a^b^c first b^c will be evaluated
3. Ѳ > $ and $ < Ѳ for all operators Ѳ
Eg: + .> $ or $ <. +
- .> $ or $ <. -
* .> $ or $ <. *
/ .> $ or $ <. /
4. Ѳ <.id, id .> Ѳ , Ѳ <. ( , (<.Ѳ, ) .> Ѳ, Ѳ .> ) for all Ѳ

( = ), ( < (, ( < id, $<(, $<id, id>$, id >), )>$, )>)

If no precedence relation holds between pair of terminals, then error recovery routine is called.

id+id*id

Consider
id + * $
.
id er > .> .>
r
+ <. .
> <. .>
* <. .
> <. .>
$ <. <. <. err

$id+id*id$
The string with the precedence relations inserted is :
$<.id.>+<.id.>*<id.>$
1. Scan the string from left end, until we encounter .> sign
2. Then Scan backwards to the left until we encounter <. sign
3. handle contains every thing to the left of .> and to the right of <.
in this eg, 1st handle is <id> reduce it to E
E+id*id
E+E*id
E+E*E
Now delete the nonterminals
$+*$ is obtained
Insert precedence relation
$<. +<. * .>$
This indicates that, the left end of the handle lies between + and * and the right end of
the handle lies between * and $
ie. in E+E*E, the handle is E*E

Obtain operator precedence relation for


EE+E|E-E|E*E|E/E|E^E|(E)|-E|id

^ has highest precedence and is right associative


* and / are of next highest precedence and are left associative
+ and – are of lowest precedence and are left associative

Page 162 fig 5.7 HW


Operator precedence parsing using operator precedence grammar
a & b terminals
α β γ  are nonterminals
1. a=b , if a appears immediately to the left of b or they may be separated by one
nonterminal
(i) α a β b γ : β may be ℇ or single nonterminal [ a=b ]
(ii) SiCtSeS [ i=t, t=e, because they are separated by single NT]
.
2. a< b, if a nonterminal A appears immediately to the right of a , and contains a string in
which b is the 1st terminal symbol
(i) α a Aβ and Aγb  where γ may be ℇ or single nonterminal [ a< b]

EE+E
This is of the form α a Aβ a:+
E+ E and EE*E, this is of the form γb  b:*
.
+< *
(ii) SiCtS, Cb
i<b
3. a.>b, if a nonterminal A appears to the left of b and contain a string in which a is the last
terminal
(i) α Abβ and Aγa  where  may be ℇ or single nonterminal [a>b]
EE+E , this is of the form α Abβ
E+E b:+
Where EE*E, this is of the form γa  a: *
*>+
(ii) SiCtS, Cb
b>t
4. $ <. b, where b is the 1st terminal
5. a .> $, where a is the last terminal

Terminal NT Terminal =
Terminal NT <
NT Terminal >
Start symbol $ < first terminal
Last terminal > $ End symbol

consider the grammar


EE+T / T, TT*F / F, F(E) / id
Nonterminals 1st terminal last terminal
F ( id ) id
T * ( id * ) id
E + * ( id + * ) id

I) consider <. : rule2


Terminal followed by nonterminals
(Terminal immediately to the left of Nonterminal)
Terminal < first terminal of NT
(1) +T
+ <. * , ( , id
(2) *F
* <. , ( , id
(3) (E
( <. +, * , id, (
II) consider .> : rule 3
Non Terminal followed by Terminals
(Non Terminal immediately to the left of Terminal)
Last Terminal of NT > Terminal

(1) E+
.
+ * ) id > +
(2) T*
* ) id .> *
(3) E)
+ * ) id .> )

III) Terminal NT Terminal : rule 1


( E)
( = )

According to rule4
$ must be related by <. symbol for all 1st terminals $ is start symbol
$<.E
$<.*, $<.+, $<. (, $<.id

According to rule 5,
$ must be related by > symbol for all last terminals $ is end symbol
E.>$
*<$, +.>$, ) .>$, id .>$
Operator Precedence relation

+ * ( ) id $
.
+ > <. <. .
> <. .
>
.
* > .> <. .
> <. .
>
( <. <. <. ≐ <. Err
.
) > .> Er .
> Er .
>
r r
.
i > .> Er .
> Er .
>
d r r
$ <. <. <. err <. Err

Algorithm 5.1 Computation of LEADING


Input : CFG
Output: Boolean array L[A,a] in which the entry is true, if a is in LEADING(A)

Procedure INSTALL(A,a)
If not L[A,a] then
begin
L[A,a]:=true;
Push(A,a) onto STACK
End

Main procedure
begin
/* Initialize L */
for each nonterminal A and terminal a do L[A,a]:=false;
for each production of the form Aa or ABa do
INSTALL(A,a)
While STACK not empty do
begin
pop top pair(B , a) from STACK;
for each production of the form AB do
INSTALL(A,a)
end
end

Algorithm 5.2
Calculation of Operator Precedence relation
Input : operator grammar G
Output: relations <. , .> , ≐ for G
Method:
Compute LEADING(A) and TRAILING(A) for each nonterminal A
Examine the position of right side of each production
Set $<.a for all a in LEADING(S) and set b>$ for all b in TRAILING(S), where S is the start
symbol

For each production AX1X2….Xn do


For i:=1 to n-1 do
Begin
If Xi and Xi+1 are both terminals, then set Xi=Xi+1
If i<=n-2 and Xi and Xi+2 are terminals,
And Xi+1 is a nonterminal then
Set Xi=Xi+2;
If Xi is a terminal and Xi+1 is a nonterminal then
For all a in LEADING(Xi+1) do set Xi < a;
If Xi is a nonterminal and Xi+1 is a terminal then
For all a in TRAILING(Xi) do set a> Xi +1
End
5.13 operator precedence parsing algorithm
pg 171
repeat forever
if only $ is on the stack, and only $ is on the input then
accept and break
else
begin
let a be the top most terminal symbol on the stack
and let b be the current input symbol
if a<.b or a≐b then shift b on to the stack
else if a.>b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by <
to the terminal most recently popped
else call the error correcting routine
end

fig 5.14 Action of operator precedence parsing


Precedence function
The table can be encoded by 2 precedence functions f and g
f(a) < g(b) where a<.b
f(a) > g(b) where a.>b
f(a) = g(b) where a≐b
finding precedence function for a table
1. Create symbols fa, ga for each a , that is terminal or $ sign
2. Partition the created symbols into as many groups as possible
If a≐b, then fa and gb are in the same group
3. Create a directed graph
If a<.b, place an edge from gbfa
If a.>b, place an edge from fagb
4. If the graph constructed has cycles, then no precedence functions exist
If there are no cycles, f(a) be the length of the longest path beginning at f(a)
g(b) be the length of the longest path from the group g(b)
g
id + * $
f id er .> .> .>
r Consider the matrix
+ <. .> <. > No = relation ship
. . . .
* < > < > So each symbol is in separate group
$ <. <. <. err
Graph for preceding function
No cycles
Therefore, Precedence function exists
f($)=0
g($)=0
g(+)=1
gidf*g*f+g+f$
g(id)=5

id + * $
f 4 2 4 0
g 5 1 3 0
Top down Parsing
Top down parsing involves backtracking.
It scans the input repeatedly
Consider the grammar
ScAd
Aab/a
Input:
w=cad
To construct a parse tree for this sentence using top down,
Initially construct a tree consisting of single node labelled S
The input pointer points to c
Use the first production of S to expand the tree and obtain Parse tree
S

c A d
The left most leaf c, matches the 1st symbol of w.
So advance the input pointer to a.
Consider the next leaf A
Expand A using the 1st alternative and obtain the tree
S

c A d

a b
We now have a match for 2nd input symbol.a
Consider the next input symbol d and the next leaf b
b does not match with d
so, report failure
go back and see is there any alternative for A
while going back, reset the input pointer to position 2 (the place we had, when we came to A)
now try with the 2nd alternative for A
Now the leaf a matches the second symbol of w and the leaf d matches the 3rd symbol.
Thus parse tree for w is produced.
S

c A d

Recursive procedure for top down parsing


Procedure S( );
begin
If input symbol=’c’ then
begin
ADVANCE();
if A( ) then
if input symbol = ‘d’ then
begin ADVANCE ( ); return true end
end
return false
end
(a) Procedure S

Procedure A( );
begin
isave:=input-pointer;
if input symbol = ‘a’ then
begin
ADVANCE( );
If input symbol =’b’ then
begin ADVANCE ( ); return true end
end
input-pointer:=isave;
/* failure to find ab */
If input symbol= ‘a’ then
begin ADVANCE ( ); return true end
else return false
end
(b) Procedure A

Difficulties in top down parsing:


 Left Recursion
 Back tracking
 Left factoring
Left recursion :
A grammar is said to be left recursive, if it has a nonterminal A such that AA
Left recursion causes infinite loop
AA/ 
We must eliminate all left recursive grammar
It will make the parser into an infinite loop

Backtracking:
If we make sequence of expansions and subsequently discover a mismatch, we have to undo the
semantic effects of all erroneous expansion. Entries made in the symbol table have to be
removed.
Left factoring:
The order in which the alternatives are tried can affect the language
Eliminating the left recursion
Consider
AA/  (β does not begin with A)
The left recursion can be eliminated with the pair of productions
AA’
A′A′|ℇ

Equivalent Parse trees:


AA | 

AA′
A′A′ | ℇ
Consider the grammar
EE+T / T
TT*F / F
F (E) /id
Now we obtain
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
Eliminate immediate left recursions (productions of the form AA)
To eliminate immediate left recursion among all A productions we group the A productions as,
AA1 | A2 |….. | Am | 1 | β2 | …. |n
Where no β begins with an A. Then we replace the A productions by
A1A′ | 2A′ | . . . | βnA′
A′1A′ | 2A′ | . . . | mA′ |ℇ
Algorithm to eliminate left recursion
1. Arrange the nonterminals of G in some order A1, A2 … An
2. For i:= 1 to n do
begin
for j:=1 to i-1 do
replace each production of the form AiAjγ
by the productions Aiβ1γ | β2γ | . . . | βkγ
eliminate the immediate left recursion among the Ai productions
end

Eliminate left factoring:


If we have 2 productions
Statementif condition then statement else statement
| if condition then statement
On seeing the input symbol if, we could not tell which statement to use.
Useful method for manipulating grammar is left factoring
The process of factoring out the common prefixes of alternatives.
A | γ are 2 A-Productions.
The input begins with 
We do not know whether to expand A to  | γ
We can expand
AA′
A′ | γ
Eg:
Consider the grammar
SiCtS | iCtSeS | a
Cb
When we use left factoring,
SiCtSS′ | a
S′ eS | ℇ
C b
Recursive descendent Parsing:
A parser that uses set of recursive procedures to recognize its input with no backtracking is
called recursive descendent parser

Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
This is example for non backtracking recursive descent parser

Mutually recursive procedure to recognize arithmetic expressions:


procedure E( );
begin
T( );
EPRIME( )
end;
procedure EPRIME( );
if input-symbol=’+’ then
begin
ADVANCE( );
T( );
EPRIME( )
end;
procedure T( );
begin
F( );
TPRIME( )
end;
procedure TPRIME( );
if input-symbol=’*’ then
begin
ADVANCE( );
F( );
TPRIME( )
end;

procedure F( );
if input-symbol= ‘id’ then
ADVANCE( )
else if input-symbol=’(‘ then
begin
ADVANCE( );
E( );
If input-symbol= ‘)‘ then
ADVANCE( );
else
ERROR( )
end
else ERROR( )

Transition Diagrams:
Draw one transition diagram for each nonterminal
The labels of the edges are tokens or nonterminals
A transition on a token-> means if the token is the next input symbol
Edges may be labelled by nonterminals
For each nonterminal A:
Create an initial and final state
For each production, AX1X2…Xn, create a path from the initial to final state, with edges
labelled X1, X2, … Xn.

Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
For E:

For E ′:

For T:

For T′ :
For F:

Simplify the transition diagrams by substituting diagrams in one another


Revised transition diagrams:

For E′:

For E′:
For E :

For E:

Similarly For T:

For F:

5.5 Predictive Parsers:


The Parser has an input, Stack, Parsing table and output
Input contains the string to be parsed followed by $ sign
Stack contains sequence of grammar symbols preceded by $
Parsing table is a two dimensional array M(A,a) where A is a nonterminal and a is a Terminal
or $ sign
Consider X, the symbol on the top of the stack and a current input symbol
There are 3 Possibilities
1. If X=a=$, then it indicates successful completion of Parsing
2. If X=x≠$, then it pops X from the stack and advances the input pointer to the next input
symbol
3. If X is a nonterminal, then the program looks the entry M(X,a) of the parsing table
(i) If there is a production of the form {XUVW}, then the parser replaces X by
UVW in the stack. (with U on the top)
(ii) If there is an error entry, M[X,a]=error, then it calls the error routine.

Model of Predictive Parser

To fill the parsing table, we need to consider 2 functions FIRST and FOLLOW
Rules for finding FIRST
1. If X is terminal, then FIRST(x)=x
2. If X is a nonterminal, and
(i) is of the form, Xa′, FIRST(X) = a
(ii) is of the form, Xℇ, FIRST(X)=ℇ
3. If X is a nonterminal, and is of the form,XY1Y2Y3…Yk and all Y1,Y2….Yk are
nonterminals, then find FIRST(Y1).
Then add all non ℇ symbols of FIRST(Y1) to FIRST(X).
If ℇ is in FIRST(Y1), then add all non ℇ symbols of FIRST(Y2) to FIRST(X).
If ℇ is in FIRST(Y1) & FIRST(Y) , then add all non ℇ symbols of FIRST(Y 3) to
FIRST(X).
If all of FIRST(Y1) upto FIRST(Yk) contains ℇ, then add ℇ to FIRST(X).
Rules for finding FOLLOW
1. Add $ to FOLLOW(S), where S is the Start symbol
2. If there is a production AB, ≠ℇ, then FOLLOW(B)=everything in FIRST() except ℇ
3. If there is a production AB or AB, where FIRST() contains ℇ, then FOLLOW(B)=
everything in FOLLOW(A)

Predictive Parsing Program


repeat
begin
let X be the top stack symbol and a the next input symbol
if X is a terminal or $ then
if X=a then
pop X from the stack and remove a from the input
else
ERROR( )
else /* X is a nonterminal */
if M[X,a]=XY1Y2….Yk then
begin
pop X from the stack;
push Yk,Yk-1, … Y1 on the top
end
else
ERROR( )
end
until X=$ /* stack becomes empty * /

Consider the grammar:


EE+T / T
T T*F / F
F ( E ) / id
Input: id +id*id
After eliminating left recursion the grammar becomes,
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F( E ) | id
Apply First rule :
Rule 1 is not applicable for any production
Rule2: Xa
(i)F(E)
First(F)=(
(ii)Fid
First(F)=id
(iii) E′+TE′
First(E′)=+
(iv) T′*FT′
First(T′)=*
(v) E′ ℇ
First(E′)= ℇ
(vi) T′ ℇ
First(T′)= ℇ
Rule 3: XY1Y2Y3…Yk
(i)TFT′
FIRST(T)= FIRST(F) = ( , id [first(F) does not contain ℇ. No need to to find next ]
(ii) ETE′
FIRST(E)= FIRST(T) = ( , id
Apply Follow rule :
Rule 1: Add $ to follow(S), S is a start symbol
FOLLOW(E)=$
Rule2: AB [follow(B)=first()except ℇ]
(i)F( E ) [ B ]
FOLLOW(E)= FIRST( ) ) = )
(ii) T′* F T′ [ B ]
FOLLOW(F)= FIRST(T′ ) except ℇ = *
(iii) E′+TE′
FOLLOW(T)= FIRST(E′ ) except ℇ = +
Rule 3: AB [ follow(B)= every thing in Follow(A) ]
(i) ETE′ [A  B ]
Follow(E′)= follow(E)
=) ,$
(ii)E′ +TE′ [A  B  where first() contains ℇ]
First(E′) = + ℇ [it contains ℇ]
Follow(T)=Follow(E′)
=) $
(iii) TFT′ [A  B ]
Follow(T′)=follow(T)
= + ) $
(iv) T′* F T′ [A  B  where first() contains ℇ]
First(T′) = * ℇ [it contains ℇ]
FOLLOW(f)= Follow (T′)
=+ ) $

Nonterminals First Follow


E ( id ) $
E′ + ℇ ) $
T ( id + ) $
T′ + ℇ + ) $
F ( id * + ) $

id + * ( ) $
E ETE′ ETE′
E E′+TE′ E′ ℇ E′$

T TFT′ TFT′
T T′ℇ T′*FT T′ℇ T′ℇ
′ ′
F Fid FE

Construction of Parsing table


1.For each terminal a in First(), add A to M(A,a)
2.If ℇ is in first(), then add the production A to M(A,b) for each terminal b in follow(A)
3.If ℇ is in first() and $ in follow(A), add A to M[A,$]
4. All other entries are defined as errors.
Note:
If the given grammar G is left recursive or ambiguous, then M may have atleast one multiply
defined entry
When the parsing table has, multiply defined entries, this eliminates recursion and then left
factoring wherever possible.
A grammar whose parsing table has no multiply defined entries is said to be LL(1)

CHAPTER 6
AUTOMATIC CONSTRUCTION OF EFFICIENT
PARSERS
INTRODUCTION:
 LR Parsers are called so because, they scan the input from left to right and construct a
right most derivation in reverse.
 LR parsers are attractive because of the following reasons.
 LR parsers recognize all programming language constructs
 LR parsing method is more general than operator precedence or any other parser
 LR Parsers dominates the common forms of top down parsing without backtrack.
 LR parsers can detect syntactic errors as soon as possible
Generating an LR Parser

Different techniques for producing LR Parsing tables


LR Parsers

Simple LR Parser Canonical LR Parser Look Ahead LR Parser


(SLR) (CLR) (LALR)

1. Easier to implement 1. Most Powerful 1. Intermediate in power


2. Fail to produce table for 2. Will work on large class of grammar 2. Works on all class of Grammar
certain grammars 3. Very expensive 3. Can be implemented efficiently

6.1 LR PARSERS
LR Parser has an input, a stack and a parsing table
Input is read from left to right, one symbol at a time
Stack contains string of the form S0X1S1X2S2X3S3…..XmSm , Sm is on the top
Each Xi is a grammar symbol
Each Si is a State symbol (used to guide the shift reduce decision)
The Parsing table contains 2 parts
 Parsing Function ACTION
 Goto function GOTO
The LR Parser program behaves as follows:
--Take the symbol on the top of stack Sm and the current input a
--See the parsing table entry for state Sm and input a
--Action(Sm,a)
--The entry Action(Sm,a) can have any one of 4 values
 Shift
 Reduce A
 Accept
 Error
The GOTO function takes a state and grammar symbol as arguments and produces a state.

The Configuration of an LR parser is a pair,


1st component is the stack contents
2nd component is unexpended input

(s0 X1 s1 X2 s2 . . . Xm sm, ai ai+1 . . . an $)


The next move of the parser is determined by reading
ai, the current input symbol and
sm the state on top of the stack and
then consult the parsing action table entry ACTION[sm,ai]
4 types of moves:
 If ACTION[Sm, ai]=shift s, the parser executes a shift
(s0 X1 s1 X2 s2 . . . Xm ai sm, ai+1 . . . an $)
 If ACTION[Sm, ai]= reduce A, the parser executes a reduce
(s0 X1 s1 X2 s2 . . . Xm-r sm-r A s, ai ai+1 . . . an $)
 If ACTION[Sm, ai]= accept, the parsing is completed.
 If ACTION[Sm, ai]= error, the parser calls the error recovery routine.

Construction of SLR Parsing table:


Consider the augmented grammar
E′E
EE+T
ET
TT*F
TF
F(E)
Fid

I = {E′E}
Closure(I)= E′.E Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F I0
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id

From I0, look for items with E immediately to the right of dot.
Goto(I0, E)= {E′E.
EE.+T} 

From I0, look for items with T immediately to the right of dot.
Goto(I0, T)= {ET.
TT.*F }

From I0, look for items with F immediately to the right of dot.
Goto(I0, F)= { TF.} I3

From I0, look for items with ( immediately to the right of dot.
Goto(I0, ( )={ F (.E) Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id }  I4

From I0, look for items with id immediately to the right of dot.
Goto(I0, id )={ Fid.} I5

From I1, look for items with + immediately to the right of dot.
Goto(I1, + )={EE+.T
T.T*F
T.F
F.(E)
F.id } I6

From I2, look for items with * immediately to the right of dot.
Goto(I2, *)={ TT*.F
F.(E)
F.id } I7
Goto(I3, null)
Goto(I4, E)={F(E.)
EE.+T}  I8

Goto(I4, T)=
{ET.
TT.*F } I2

Goto(I4, F)=
{ TF.} I3

Goto(I4, ( )={ F (.E)


E.E+T
E.T
T.T*F
T.F
F.(E)
F.id }  I4

Goto(I4, id )={Fid.} I5


Goto(I5, null)
Goto(I6, T)=
{EE+T.
TT.*F } I9

Goto(I6, F)= { TF.} I3

Goto(I6, ( )={ F (.E)


E.E+T
E.T
T.T*F
T.F
F.(E)
F.id }  I4

Goto(I6, id )={Fid.} I5


Goto(I7, ( )={ F (.E)
E.E+T
E.T
T.T*F
T.F
F.(E)
F.id }  I4

Goto(I7, id )={Fid.} I5

Goto(I7, F)= { TT*F.} I10

Goto(I8, + )={EE+.T
T.T*F
T.F
F.(E)
F.id } I6

Goto(I8, ) )={F(E).} I11

Goto(I9, *)={ TT*.F


F.(E)
F.id } I7

Find follow of nonterminals


1. E′E
Follow(E)=follow(E′)
E′ is the start symbol. So follow(E′)=$
This  follow(E)=$

2. F(E)
Follow(E)=)

3. EE+T
Follow(E)=+

4. TT*F
Follow(T)=*

5. ET
Follow(T)= Follow(E)
=+ ) $

6. TF
Follow(F)= Follow(T)
= * + ) $

N follow
T
E ) + $
T * ) + $
F * ) + $

SLR Parsing table:


Accept:
If E′E. is in Ii, then accept
E′E. is in I1.
Hence (1,$)= accept

Reduce:
1. EE+T. in I9 r1
[ I9, follow symbols = r1]
Action( 9, ) ) = r1
Action( 9, + ) = r1
Action( 9, $ ) = r1

2. ET. is in I2 r2
[ I2, follow symbols = r2]
Action( 2, + ) = r2
Action( 2, ) ) = r2
Action( 2, $ ) = r2

3. TT*F. is in I10 r3


[ I10, follow symbols = r3]
Action( 10, * ) = r3
Action( 10, + ) = r3
Action( 10, ) ) = r3
Action( 10, $ ) = r3

4. TF. is in I3 r4
[ I3 follow symbols = r4]
Action( 3, * ) = r4
Action( 3, + ) = r4
Action( 3, ) ) = r4
Action( 3, $ ) = r4

5. F(E). is in I11 r5


[ I11, follow symbols = r5]
Action( 11, * ) = r5
Action( 11, + ) = r5
Action( 11, ) ) = r5
Action( 11, $ ) = r5

6. Fid. is in I5 r6
[ I11, follow symbols = r5]
Action( 5, * ) = r5
Action( 5, + ) = r5
Action( 5, ) ) = r5
Action( 5, $ ) = r5

Shift:
Refer the terminals
( I0, ( ) gives I4
(r,c) : 0,(s4

( I0, id ) gives I5
0,ids5

( I1, + ) gives I6
1,+s6

( I0, * ) gives I7
2,*s7

Stat Action Goto


e id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 accept
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Parsing table

LR grammar:
A grammar for which every entry is uniquely defined in parking table is called LR grammar
LR(0) item
LR(0) item of a grammar G is production of grammar with a dot at some position on the right
side.
A.XYZ

6.3 CONSTRUCTION OF SLR PARSING TABLE

Algorithm: 6.2
Input: C, the Canonical collection of set of items for augmented grammar G′
Output: LR parsing table with Action and go to
Method:
Let C={I0,I,I2, …. In}
The parsing actions of the state i are as follows
1. If [A.a] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A.] is in Ii, then set ACTION[I, a]= reduce A for all a in FOLLOW(A)
3. If [S′S.] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S]

6.4 Constructing canonical LR Parsing tables

Each state contains an extra item.


A terminal symbol is included as second component
{A . , a} is the general form of an item,
where A is a production,
a is a terminal symbol or $ sign
It is called as LR(1) item, where 1 refers to the length of the 2nd component called lookahead of
the item.

Consider the augmented grammar


S′S
SCC
CcC | d

Itemset I={ S′S}

Find the closure of { S′S, $}

Closure(I)

Match the item S′ℇ .S ℇ , $ with A . B , a


A  B  a

This closure tells us to add {B.γ,b} for each production Bγ and a
Terminal b in First(a)
Here Bγ is, SCC, first(a)={ℇ $}=$
S.CC is added.

Add all the items [ C.γ,b] for b in first(c$)


S .CC, $
A .B a
.First(a)=first(C$)=first(C)=c,d

I0=S′.S, $
S.CC, $ I0
C,cC, c / d
C.d, c / d

Goto (I0, S) = {S′.S, $ } I1

Goto (I0, C) = {SC.C, $


C.cC, $
C.d, $ } I2

Goto (I0, c) = Cc.C, c/d


C.cC, c/d
C.d, c/d } I3
Goto (I0, d) = Cd. , c/d } I4

Goto (I1, Null)

Goto (I2, C) = {SCC., $} I5

Goto((I2, c) = Cc.C, $
C.cC, $
C.d, $ } I6

Goto (I2, d) = Cd. , $ } I7

Goto (I3, C) = {CcC., c/d} I8

Goto((I3, c) = Cc.C, c/d


C.cC, c/d
C.d, c/d} I3

Goto (I3, d) = Cd. , c/d } I4

Goto (I6, C) = CcC. , $ } I9

Goto((I6, c) = Cc.C, $
C.cC, $
C.d, $ } I6

Goto (I6, d) = Cd. , $ } I7

GOTO graph:
Accept:
S′S. is available in I1.
So I1,$= accept

Reduce:

1) SCC. is in I5. So reduce it to r1


Action (I5, $) = r1

2) CcC. is in I8 and I9 . So reduce it to r2.


Action (I8, c) = r2
Action (I8, d) = r2
Action (I9, $) = r2

3) Cd. is in I4. So reduce it to r3


Action (I4, c) = r3
Action (I4, d) = r3
Action (I4, $) = r3

Stat Action Goto


e c d $ S C
0 S3 S4 1 2
1 Accep
t
2 S6 S7 5
3 S3 S4 8
4 r3 r3
5 r1
6 S6 S7 9
7 r3
8 r2 r2
9 r2

Algorithm 6.3 :Construction of canonical LR parsing table


Algorithm:
Input: A grammar G augmented by production S′S
Output: Canonical LR parsing action function ACTION and go to function GOTO

Method:
Let C={I0,I,I2, …. In}
The state I of the parser is constructed from Ii.
The parsing actions of the state i are as follows
1. If [A.a, b] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A., a] is in Ii, then set ACTION[I, a]= reduce A
3. If [S′S., $] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S, $ ]
6.5)Construction of LALR parsing table:

 Tables obtained by LALR is smaller than CLR


 Similar Syntactic constructs are grouped
 SLR and LALR parsing tables have same number of states

Consider the grammar


S′S
SCC
CcC | d

1) States 3 and 6 are same. They only differ in the 2nd item
Hence they can be combined

I36 : = Cc.C, c/d/$


C.cC, c/d/$
C.d, c/d/$

2) States 4 and 7 are same. They only differ in the 2nd item
Hence they can be combined

I47 : = Cd. , c/d/$

3) States 8 and 9 are same. They only differ in the 2nd item
Hence they can be combined

I89 : = CcC. , c/d/$

Stat Action Goto


e c d $ S C
0 S36 S47 1 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3
5 r1
89 r2 r2 r2

6.6 Using Ambiguous grammar

Consider the ambiguous grammar for the expression


E E+E | E*E | (E) | id

E′E
I = {E′E}

Closure(I)= E′.E
E.E+E
E.E*E  I0
E. (E)
E.id
Goto(I0, E)= { E′E.
EE.+E
EE.*E }  I1

Goto(I0, ( )={ E (. E)
E.E+E
E.E*E
E. (E)
E.id }  I2

Goto(I0, id )={ E id. }  I3

Goto(I1, +)= { EE+.E


E.E+E
E.E*E
E. (E)
E.id } I4

Goto(I1, *)= { EE*.E


E.E+E
E.E*E
E. (E)
E.id } I5

Goto(I2, E)= { E (E.)


E E. +E
E E. *E }  I6

Goto(I2, ( )= { E (.E)
E.E+E
E.E*E
E. (E)
E.id }  I2

Goto(I2, id )= {Eid . }  I3

Goto(I3, null)

Goto(I4, E)= {EE+E.


EE.+E
EE.*E } I7

Goto(I4, ()= { E (. E)
E.E+E
E.E*E
E. (E)
E.id }  I2
Goto(I4, id )= {Eid . }  I3

Goto(I5, E)= { EE*E.


EE.+E
EE.*E }  I8
Goto(I5, ( )= E (. E)
E.E+E
E.E*E
E. (E)
E.id }  I2

Goto(I5,id )= {Eid . }  I3

Goto(I6, ))= { E (E) . } I 9

Goto(I6, +)= { E E +.E


E.E+E
E.E*E
E. (E)
E.id } I 4

Goto(I6, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5

Goto(I7, +)= { EE+.E


E.E+E
E.E*E
E. (E)
E.id } I 4

Goto(I7, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5

Goto(I8, +)= { EE+.E


E.E+E
E.E*E
E. (E)
E.id } I 4

Goto(I8, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5
Goto(i9,null)
Find follow of nonterminals
1. E′E
Follow(E)=follow(E′)
E′ is the start symbol. So follow(E′)=$
This  follow(E)=$

2. EE+E
Follow(E)=+

3. EE*E
Follow(E)=*

4. E(E)
Follow(E)= )

N follow
T
E ) + * $
Accept:
If E′E. is in Ii, then accept
E′E. is in I1.
Hence (1,$)= accept

Reduce:
1. EE+E. in I7 r1
[ I7, follow symbols = r1]
Action( 7, ) ) = r1
Action( 7, + ) = r1
Action( 7, * ) = r1
Action( 7, $ ) = r1

2. EE*E. in I8 r2
[ I8, follow symbols = r2]
Action( 8, ) ) = r2
Action( 8, + ) = r2
Action( 8, * ) = r2
Action( 8, $ ) = r2

3. E(E). is in I9 r3
[ I9, follow symbols = r3]
Action( 9, ) ) = r3
Action( 9, + ) = r3
Action( 9, * ) = r3
Action( 9, $ ) = r3

4. Eid. Is in I3 r4
[ I3, follow symbols = r3]
Action( 3, ) ) = r4
Action( 3, + ) = r4
Action( 3, * ) = r4
Action( 3, $ ) = r4

Shift:
Refer the terminals
( I0, ( ) gives I2
(r,c) : 0,(s2

( I0, id ) gives I3
(r,c) : 0,id s3

( I1, + ) gives I4
(r,c) : 1,+s4

( I1, * ) gives I5
(r,c) : 1,* s5

( I2, ( ) gives I2
(r,c) : 2,(s2

( I2, id ) gives I3
(r,c) : 2,id s3

( I4, ( ) gives I2
(r,c) : 4,(s2

( I4, id ) gives I3
(r,c) : 4,id s3
( I5, ( ) gives I2
(r,c) : 5,(s2

( I5, id ) gives I3
(r,c) : 5,id s3

( I6, + ) gives I4
(r,c) : 6,+s4

( I6, * ) gives I5
(r,c) : 6,* s5
( I6, ) ) gives I4
(r,c) : 6,)s9

( I7, + ) gives I4
(r,c) : 7,+s4

( I7, * ) gives I5
(r,c) : 7,* s5

( I8, + ) gives I4
(r,c) : 8,+s4

( I8, * ) gives I5
(r,c) : 8,* s5
Stat Action Goto
e id + * ( ) $ E
0 s3 s 1
2
1 S4 S5 accept
2 s3 s
2
3 r4 r4 r4 r4
4 s3 s 8
2
5 s3 s
2
6 s4 s5 s9
7 r1/s4 s5/r1 r1 r1
8 r2/s4 r2/s5 r2 r2
9 r3 r3 r3 r3

Assuming + is left associative, the action of state 7 on input + should be to reduce EE+E
Assuming * takes precedence over +, the action of state 7 on input * should be to shift

Similarly, assuming that * is left associative and takes precedence over +, we can say that,
The action of state 8 on both the inputs + and * should be to reduce EE*E.
(in case of input+, the reason is * takes precedence over +)
(in case of input *, the reason is * is left associative)

Consider the dangling else grammar


S′S
SiSeS | iS | a

S′S
I = { S′S}
Closure(I)= S′.S
S. iSeS
S. iS  I0
S. a

Goto(I0, S)= S′S.  I1

Goto(I0, i)= Si.SeS


S i.S  I2
S. iSeS
S. iS
S. a

Goto(I0, a)= = S a .  I3

Goto(I1, Null)

Goto(I2, S)= SiS.eS


S iS .  I4

Goto(I2, i)=
S i.SeS  I2
S i.S
S. iSeS
S. iS
S. a
Goto(I2, a)= = S a .  I3

Goto(I3, Null)

Goto(I4, e)= SiSe.S  I5


S. iSeS
S. iS
S. a

Goto(I5, S)= SiSeS .  I6

Goto(I5, i)= Si. SeS  I2


Si. S
S. iSeS
S. iS

S. a

Goto(I5, a)= = S a .  I3

Find the follow of nonterminals:

1. S′S
Follow(S)=follow(S′)
S′ is the start symbol. So follow(S′)=$
This  follow(S)=$

2. SiSeS
Follow(S)=e

Non terminals Follow


S e,$

Accept:
If S′S. is in Ii, then accept
S′S. is in I1.
Hence (1,$)= accept

Reduce:
1. SiSeS . is in I6 r1
[ I6, follow symbols = r1]
Action( 6, e ) = r1
Action( 6, $ ) = r1

2. S iS. Is in I4 r2
[ I4, follow symbols = r2]
Action( 4, e ) = r2
Action( 4, $ ) = r2

3. Sa is in I3 r3
[ I3, follow symbols = r3]
Action( 3, e ) = r3
Action( 3, $ ) = r3

Shift:

Look in to the terminals


1. Goto(I0, i) gives I2.
So, 0,i= s2

2. Goto(I0, a) gives I3.


So, 0,a= s3

3. Goto(I2, i) gives I2.


So, 2,i= s2

4. Goto(I2, a) gives I3.


So, 2,a= s3
5. Goto(I4, e) gives I5.
So, 4,e= s5

6. Goto(I5, i) gives I2.


So, 5,i= s2

7. Goto(I5, a) gives I3.


So, 5,a= s3

state Action Goto


i e a $
0 S2 S3 1
1 Accept
2 S2 S3 4
3 r3 r3
4 S5/ r2 r2
5 S2 S3 6
6 r1 r1

At line 5, the state selects the shift action on input e,

You might also like