You are on page 1of 93

UNIT II

CHAPTER 5
BASIC PARSING TECHNIQUES
5.1 Parser
A parser (for grammar G) is a program which takes a input
string w and produces a parse tree as output if w is a sentence
of G. otherwise it produces an error
Types of parser: Build parse trees from bottom to top ( leaves to root )
• Bottom up parser ( Shift Reduce parsing, Operator Precedence Parsing)

Build parse trees from top to bottom


• Top down parser ( Predictive parsing, Recursive descendent Parsing)

In both the cases , the parser is scanned from left to right, one
symbol at a time
Shift Reduce Parsing:

Shift reduce parsing is a bottom up passing.


It shifts the input symbols onto the stack, until the right side of the
production appears on the top.
Then the right side may be replaced by a symbol on the left side of
the production and the process is repeated.

• Operator precedence parser and LR parser are examples of shift reduce


parsing.
• Operator precedence parsing is suitable for parsing expressions (uses
information about the precedence and associativity of operators)
Recursive descendent Parsing :
Recursive descendent parsing is a top down parsing.
It uses a collection of recursive routines to perform parsing

• Predictive Parser is an example of Recursive descendent


parsing
• LL parser is a type of Predictive parser
Representation of parse trees:
2 types of representation
• Implicit
• Explicit
--Sequence of Productions used in derivation is an example of
implicit representation
--Linked list structure is an example of explicit representation
Derivation

Left most derivation Right most derivation


Left most non terminal is Right most non terminal is replaced at every step
replaced at every step Also called as Canonical derivation
Consider the grammar

Parse tree T
Construct a LMD for the sentence w= i b t i b t a e a
LMD:

SiCtS
ibtS RMD:
ibtiCtSeS
ibtibtSeS SiCtS
iCtiCtSeS
ibtibtaeS
iCtiCtSea
ibtibtaea iCtiCtaea
iCtibtaea
ibtibtaea
Constructing a LMD
5.2 Shift reduce Parsing
 Shift reduce parsing is an example of Bottom up Parsing

 It constructs the parse trees starting from the leaves, and

working up towards the root

Reduction :
• Look for the substring that match the right side of some
production
• Replace it by a symbol on the left
• {Replacement of right side of a production by its left side
is called Reduction}
Consider the grammar
SaAcBe
AAb|b
Bd
and the string abbcde
 We want to reduce the string to S

Given string is abbcde


From the grammar, abbcde , it is noted that
Ab : aAbcde
AAb : aAcde
Bd : aAcBe
SaAcBe : S
Handle:
It is a substring which matches the right side of the
production, such that replacement of substring by a production
on the left side leads to start symbol

Handle Pruning:
• Removing the handle by replacing the left side of the
production
• RMD in reverse is obtained by handle pruning
• The string appearing to the right side of a handle contains
only terminals
Consider the grammar
EE+E, EE*E, E(E), Eid
Consider the RMD

EE+E
E+E*E
E+E*id3
E+id2*id3
id1+id2*id3
id1handle
Input: id+id*id
Consider the grammar EE+E, EE*E, E(E), Eid
and input string id1+id2*id3
consider the sequence of reductions that leads to the start symbol E
E+id2*id3
E+E*id3
E+E*E
E+E
E

Sequence of right sentential form is the reverse of RMD


SR parsing Operations
A shift reduce parser does 4 operations – Shift, Reduce, Accept, Error
Shift
The next input symbol is shifted to the top of the stack
Reduce
• The parser identifies the handle at the top of the stack
• It compares the handle to the right side of the production
• When there is a match, the handle is replaced by the left side of the
production
Accept: It indicates the successful completion of parsing
Error: Indicates that some error has occurred, and calls an error
recovery routine
SR Parsing
• Use a stack and a input buffer.
• $ symbol is used to mark the bottom of the stack and right most
end of the input
Stack bottom Input
$ w$
• The parser operates by shifting 0 or more input symbols to the
stack, until a handle is on the top of stack.
• The parser then reduces the handle to the left side of the
production
• This process is repeated until the stack contains the start symbol
and the input is empty ( or )an error is detected
Stack bottom Input
$S $
Input string id1+id2*id3
Grammar: EE+E | E*E | -(E) | id
Shift Reduce parsing actions
Stack input Action
$ id1+id2*id3$ Shift
$id1 +id2*id3$ Reduce by Eid
$E +id2*id3$ Shift
$E+ id2*id3$ Shift
$E+ id2 *id3$ Reduce by Eid
$E+ E *id3$ Shift
$E+ E* id3$ Shift
$E+ E*id3 $ Reduce by Eid
$E+ E*E $ Reduce by EE*E
$E+ E $ Reduce by EE+E
$E $ Accept
Constructing parse tree
• When we shift an input symbol a on to the stack, we create one
node of the tree labelled a. Both the root and the yield of the
tree are a

• When we reduce X1X2X3…..Xn, to A, we create a new node


labelled A Its children are the root of X1,X2, …Xn

• For each symbol on the stack, associate a pointer to a tree


whose root is that symbol and children are string of terminals
which have been reduced.

• At the end , the start symbol will have the entire parse tree
associated with it
Parse tree construction:
After reducing id1+id2*id3 to E+E
After shifting id1

After reducing id1 to E


After completion
5.3 Operator precedence parsing:
Operator grammar properties:
• No production have € on the right side
• No production have 2 adjacent non terminals
Eg:
EEAE | (E) | -E | id
A + | - | *
Is not an operator grammar, because EAE is on the RHS ( 3 adjacent non terminals)

EE+E | E-E | E*E | (E) | -E |id is an operator grammar


Adv:
• Easy to implement
Disadv:
• hard to handle tokens like - minus sign. - sign has2 different precedence
depending on binary or unary
• Only a small class of grammar can be parsed using operator precedence
techniques
Three disjoint precedence relations - used in Operator Precedence
parsing

These relations help in the selection of handles


 a <. b , a yields precedence to b b – high priority
 a .> b, a takes precedence over b a– high priority
 a ≐ b , a and b has equal priority
2 ways of determining precedence relation
Associativity and precedence rules of operators
( * has higher precedence than +)
* .> + or + <. *
This method resolves the ambiguity of grammar.
Operator precedence relations
Operator precedence using Associativity and precedence rule:
1. If operator Ѳ1 has higher precedence than Ѳ2, then Ѳ1.> Ѳ2and Ѳ2 <. Ѳ1
Eg: *+ * has higher precedence than +
* .> + and + <. *
E +E * E +E ; E*E will be the handle and it was reduced first
2. If Ѳ 1 and Ѳ2 have equal precedence,
Ѳ1.> Ѳ2 and Ѳ2.> Ѳ1, if operators are left associative
Ѳ1<. Ѳ2 and Ѳ2<.Ѳ1 if the operators are right associative
Eg: (i) + and – are left associative and have equal precedence
+ .> +
. (ii) ^ is right associative
+ >-
. ^ <. ^
- >+
E^E^E : reduce E^E first
- .> -
a^b^c first b^c will be
E-E+E : first reduce E-E
evaluated
E+E-E : first reduce E+E
3. Ѳ > $ and $ < Ѳ for all operators Ѳ
Eg: + .> $ or $ <. +
- .> $ or $ <. -
* .> $ or $ <. *
/ .> $ or $ <. /
4. Ѳ <.id, id .> Ѳ , Ѳ <. ( , (<.Ѳ, ) .> Ѳ, Ѳ .> ) for all Ѳ

( = ), ( < (, ( < id, $<(, $<id, id>$, id >), )>$, )>)

If no precedence relation holds between pair of terminals, then error recovery


routine is called.
Consider id+id*id
1. Scan the string from left end, until we encounter
id + * $ .> sign
id err .> .> .>

+ <. .> <. .> 2. Then Scan backwards to the left until we
* <. .> > .> encounter <. sign
$ <. <. <. err 3. handle contains every thing to the left of .> and
to the right of <.
$id+id*id$
The string with the precedence relations inserted is :
$<.id.>+<.id.>*<id.>$
Insert precedence relation
In this eg, 1st handle is <id> reduce it $<. +<. * .>$
to E This indicates that, the left end of the
$E+id*id$ handle lies between + and * and the
$E+E*id$ right end of the handle lies between *
$E+E*E$ and $
Now delete the nonterminals ie. in E+E*E, the handle is E*E
$+*$ is obtained
Obtain operator precedence relation for
EE+E|E-E|E*E|E/E|E^E|(E)|-E|id

^ has highest precedence and is right associative


* and / are of next highest precedence and are left associative
+ and – are of lowest precedence and are left associative

Page 162 fig 5.7 HW


Operator precedence parsing using operator precedence grammar
a & b terminals
α β 𝛾  are nonterminals
1.a=b , if a appears immediately to the left of b or they may be separated
by one nonterminal
(i) α a β b 𝛾 : β may be ℇ or single nonterminal [ a=b ]
(ii) SiCtSeS [ i=t, t=e, because they are separated by single
NT]
2.a<.b, if a nonterminal A appears immediately to the right of a , and contains a string
in which b is the 1st terminal symbol
(i) α a Aβ and A 𝛾𝑏 where 𝛾 may be ℇ or single nonterminal [ a< b]
EE+E
This is of the form α a Aβ a:+
E+ E and EE*E, this is of the form 𝛾𝑏 b:*
+<.*
(ii) SiCtS, Cb
i<b
3. a.>b, if a nonterminal A appears to the left of b and contain a string in which
a is the last terminal
(i) α Abβ and A 𝛾𝑎 where  may be ℇ or single nonterminal [a>b]
EE+E , this is of the form α Abβ
E+E b:+
Where EE*E, this is of the form 𝛾𝑎 a: *
*>+
(ii) SiCtS, Cb
b>t
4. $ <. b, where b is the 1st terminal
5. a .> $, where a is the last terminal
Terminal NT Terminal =
Terminal NT <
NT Terminal >
Start symbol $ < first terminal
Last terminal > $ End symbol
consider the grammar
II) consider .> : rule 3
EE+T / T, TT*F / F, F(E) / id Non Terminal followed by Terminals
Nonterminals 1st terminal last terminal (Non Terminal immediately to the left of
F ( id ) id Terminal)
T * ( id * ) id Last Terminal of NT > Terminal
E + * ( id + * ) id (1) E+
I) consider <. : rule2 + * ) id .> +
Terminal followed by nonterminals (2) T*
(Terminal immediately to the left of * ) id .> *
Nonterminal) (3) E)
Terminal < first terminal of NT + * ) id .> )
(1) +T According to rule4
+ <. * , ( , id $ must be related by <. symbol for all 1st terminals $ is start
(2) *F symbol
* <. , ( , id $<.E
(3) (E $<.*, $<.+, $<. (, $<.id
( <. +, * , id, (
According to rule 5,
III) Terminal NT Terminal : rule 1 $ must be related by > symbol for all last terminals $ is end
( E) symbol
( = ) E.>$
*<$, +.>$, ) .>$, id .>$
Operator Precedence relation

+ * ( ) id $
+ .> <. <. .> <. .>
* .> .> <. .> <. .>
( <. <. <. ≐ <. Err
) .> .> Err .> Err .>
id .> .> Err .> Err .>
$ <. <. <. err <. Err
Algorithm 5.1 Computation of LEADING Main procedure
Input : CFG begin
Output: Boolean array L[A,a] in which /* Initialize L */
the entry is true, if a is in LEADING(A) for each nonterminal A and terminal a
do L[A,a]:=false;
for each production of the form
Procedure INSTALL(A,a) Aa or ABa do
If not L[A,a] then INSTALL(A,a)
begin While STACK not empty do
L[A,a]:=true; begin
pop top pair(B , a) from STACK;
Push(A,a) onto STACK
for each production of the form
End AB do
INSTALL(A,a)
end
end
Algorithm 5.2
Calculation of Operator Precedence relation
Input : operator grammar G
Output: relations <. , .> , ≐ for G For each production AX1X2….Xn do
Method: For i:=1 to n-1 do
Compute LEADING(A) and TRAILING(A)
Begin
for each nonterminal A If Xi and Xi+1 are both terminals, then set Xi=Xi+1
If i<=n-2 and Xi and Xi+2 are terminals,
Examine the position of right side of each And Xi+1 is a nonterminal then
production Set Xi=Xi+2;
If Xi is a terminal and Xi+1 is a nonterminal then
For all a in LEADING(Xi+1) do set Xi < a;
Set $<.a for all a in LEADING(S) and set If Xi is a nonterminal and Xi+1 is a terminal then
b>$ for all b in TRAILING(S), where S is
the start symbol For all a in TRAILING(Xi) do set a> Xi +1
End
operator precedence parsing algorithm pg 171
repeat forever
if only $ is on the stack, and only $ is on the input then
accept and break
else
begin
let a be the top most terminal symbol on the stack
and let b be the current input symbol
if a<.b or a≐b then shift b on to the stack
else if a.>b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by <
to the terminal most recently popped
else call the error correcting routine
end
Fig 5.14 Action of operator precedence parsing
Precedence function
The table can be encoded by 2 precedence functions f and g
f(a) < g(b) where a<.b
f(a) > g(b) where a.>b
f(a) = g(b) where a≐b
finding precedence function for a table
1. Create symbols fa, ga for each a , that is terminal or $ sign
2. Partition the created symbols into as many groups as possible
If a≐b, then fa and gb are in the same group
If there are no cycles, f(a) be the length of the
3. Create a directed graph longest path beginning at f(a)
g(b) be the length of the longest path from
If a<.b, place an edge from gbfa the group g(b)
If a.>b, place an edge from fagb
4.If the graph constructed has cycles, then no precedence functions exist
If the graph constructed has no cycles, then precedence functions exist
g Graph for preceding function
id + * $
. .> .>
f id err >
+ <. .> <. >
* <. .> > .>
$ <. <. <. err
Consider the matrix
No = relationship
So each symbol is in separate group

No cycles
Therefore, Precedence function exists
f($)=0
g($)=0 id + * $
g(+)=1 f 4 2 4 0
gidf*g*f+g+f$ g 5 1 3 0
g(id)=5
Top down Parsing
The left most leaf c, matches the 1st symbol of w.
• Top down parsing involves backtracking.
So advance the input pointer to a.
• It scans the input repeatedly
Consider the next leaf A
Consider the grammar
ScAd Expand A using the 1st alternative and obtain the tr
Aab/a S
Input:
w=cad c A d
To construct a parse tree :
Initially construct a tree consisting of single node labelled S a b
The input pointer points to c We now have a match for 2nd input symbol.a
Use the first production of S to expand the tree and
Consider the next input symbol d and the next
obtain Parse tree
leaf b
S
b does not match with d
so, report failure
c A d
Go back and see is there any alternative for A
while going back, reset the input pointer to position 2 (the place we
had, when we came to A)
now we have to with the 2nd alternative for A
Now the leaf a matches the second symbol of w and the leaf d
matches the 3rd symbol.
Thus parse tree for w is produced.
S

c A d

a
Procedure A( ); Aab/a
Recursive procedure for top down
parsing begin
Procedure S( ); ScAd isave:=input-pointer;
begin if input symbol = ‘a’ then
If input symbol=’c’ then begin
begin ADVANCE( );
ADVANCE(); If input symbol =’b’ then
if A( ) then begin ADVANCE ( ); return true end
if input symbol = ‘d’ then end
begin ADVANCE ( ); input-pointer:=isave; /* failure to find ab */
return true end If input symbol= ‘a’ then
end begin ADVANCE ( ); return true end
return false else return false
End (a) Procedure S End (b) Procedure A
Difficulties in top down parsing: Backtracking:
 Left Recursion If we make sequence of expansions and
 Back tracking subsequently discover a mismatch, we
 Left factoring have to undo the semantic effects of all
Left recursion : erroneous expansion.
A grammar is said to be left recursive, if
it has a nonterminal A such that AA Entries made in the symbol table have to
Left recursion causes infinite loop be removed.
AA/  Left factoring:
So we must eliminate all left recursive The order in which the alternatives
grammar are tried can affect the language
It will make the parser into an infinite
loop
Eliminating the left recursion
Consider
AA/  (β does not begin with A)
The left recursion can be eliminated with the pair of productions
AA’
Equivalent Parse trees:
A′A′|ℇ

 AA′
A′A′ | ℇ
AA | 
Consider the grammar
EE+T / T, TT*F / F, F (E) /id

Now we obtain: ETE′


E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
Eliminate immediate left recursions (productions of the form AA)
To eliminate immediate left recursion among all A productions, we group the A productions as,
AA1 | A2 |….. | Am | 1 | β2 | …. |n
Where no β begins with an A. Then we replace the A productions by
A1A′ | 2A′ | . . . | βnA′
A′1A′ | 2A′ | . . . | mA′ |ℇ
Algorithm to eliminate left recursion
1. Arrange the nonterminals of G in some order A1, A2 …
An
2. For i:= 1 to n do
begin
for j:=1 to i-1 do
replace each production of the form AiAjγ
by the productions Aiβ1γ | β2γ | . . . | βkγ
eliminate the immediate left recursion among the Ai
productions
end
Eliminate left factoring:
If we have 2 productions
Statementif condition then statement else statement
| if condition then statement Eg:
On seeing the input symbol if, we could not tell which statement to Consider the grammar
use. SiCtS | iCtSeS | a
Useful method is left factoring Cb
The process of factoring out the common prefixes of alternatives. When we use left factoring,
A | γ are 2 A-Productions. SiCtSS′ | a
The input begins with  S′ eS | ℇ
We do not know whether to expand A to  | γ C b
We can expand
AA′
A′ | γ
Recursive descendent Parsing:
A parser that uses set of recursive procedures to recognize its input
with no backtracking is called recursive descendent parser

Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
This is example for non backtracking recursive descendent parser
Mutually recursive procedure to recognize arithmetic procedure TPRIME( ); T’+FT’
expressions: if input-symbol=’*’ then
procedure E( ); ETE’ begin
begin ADVANCE( );
T( ); F( );
EPRIME( )
TPRIME( )
end;
end;
procedure EPRIME( ); E’+TE’ procedure F( );Fid | (E)

if input-symbol=’+’ then if input-symbol= ‘id’ then

begin ADVANCE( )
else if input-symbol=’(‘ then
ADVANCE( );
begin
T( );
ADVANCE( );
EPRIME( )
E( );
end;
If input-symbol= ‘)‘ then
procedure T( );TFT’ ADVANCE( );
begin else
F( ); ERROR( )
TPRIME( ) end

end; else ERROR( )


Transition Diagrams:
Draw one transition diagram for each nonterminal
The labels of the edges are tokens or nonterminals

For each nonterminal A:


• Create an initial and final state
• For each production, AX1X2…Xn, Create a path from the initial to
final state, with edges labelled X1, X2, … Xn.
Consider
Simplify the transition diagrams by substituting diagrams in one another
ETE′ , E′+TE′ | ℇ
TFT′ , T′*FT′ | ℇ Revised transition diagrams:
F(E) | id For E′:

For E:

For E′ For E

For T

For T′ : Similarly For T

For F:
For F:
5.5 Predictive Parsers:

 The Parser has an input, Stack, Parsing table and output


 Input contains the string to be parsed followed by $ sign
 Stack contains sequence of grammar symbols preceeded by $
 Parsing table is a two dimensional array M(A,a) where A is
a nonterminal and a is a Terminal or $ sign
Consider X, the symbol on the top of the stack and a the current
input symbol
There are 3 Possibilities
1.If X=a=$, then it indicates successful completion of Parsing
2.If X=a≠$, then it pops X from the stack and advances the input
pointer to the next input symbol
3.If X is a nonterminal, then the program looks the entry M(X,a) of the
parsing table
(i)If there is a production of the form {XUVW}, then the parser
replaces X by UVW in the stack. (with U on the top)
(ii)If there is an error entry, M[X,a]=error, then it calls the error
routine.
Model of Predictive Parser

To fill the parsing table, we need to consider 2 functions FIRST and


FOLLOW
Rules for finding FIRST
1. If X is terminal, then FIRST(x)=x
2. If X is a nonterminal, and
(i) is of the form, Xa′, FIRST(X) = a
(ii)is of the form, Xℇ, FIRST(X)=ℇ

3. If X is a nonterminal, and is of the form,XY1Y2Y3…Yk and all Y1,Y2….Yk


are nonterminals, then find FIRST(Y1).
Then add all non ℇ symbols of FIRST(Y1) to FIRST(X).
If ℇ is in FIRST(Y1), then add all non ℇ symbols of FIRST(Y2) to FIRST(X).
If ℇ is in FIRST(Y1) & FIRST(Y2) , then add all non ℇ symbols of FIRST(Y3) to
FIRST(X).
If all of FIRST(Y1) upto FIRST(Yk) contains ℇ, then add ℇ to FIRST(X).
Rules for finding FOLLOW
1.Add $ to FOLLOW(S), where S is the Start symbol

2.If there is a production AB, ≠ℇ, then


FOLLOW(B)=everything in FIRST() except ℇ

3.If there is a production AB or AB, where FIRST()


contains ℇ, then FOLLOW(B)= everything in FOLLOW(A)
Predictive Parsing Program
repeat
begin
let X be the top stack symbol and if M[X,a]=XY1Y2….Yk then
a the next input symbol begin
if X is a terminal or $ then pop X from the stack;
if X=a then push Yk,Yk-1, … Y1 on the top
pop X from the stack and remove a end
from the input else
ERROR( )
else
end
ERROR( ) until X=$ /* stack becomes empty *
else /* X is a nonterminal */
Obtain Predictive Parsing table for the grammar:
EE+T / T
T T*F / F
F ( E ) / id
Input: id +id*id
After eliminating left recursion the grammar becomes,
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F( E ) | id
Apply FIRST rule :
Rule 1 is not applicable for any production : No terminals on the LHS in the given grammar
Rule2: Xa / X ℇ
(i)F(E)
First(F)=( Rule 3: XY1Y2Y3…Yk
(ii)Fid (i)TFT′
First(F)=id FIRST(T)= FIRST(F)
(iii) E′+TE′ = ( , id [first(F) does not contain ℇ. No need to to find next ]
First(E′)=+ (ii) ETE′
(iv) T′*FT′ FIRST(E)= FIRST(T)
First(T′)=* = ( , id
(v) E′ ℇ
First(E′)= ℇ
(vi) T′ ℇ, First(T′)= ℇ
Apply FOLLOW rule : Rule 3: AB [ follow(B)= every thing in Follow(A) ]
Rule 1: Add $ to follow(S), S is a start symbol (i) E  TE′ [A  B ]
E is the start symbol Follow(E′)= follow(E)
FOLLOW(E)=$ =) ,$

Rule2: AB [follow(B)=first() except ℇ] (ii)E′  +TE′ [A  B  where first() contains ℇ]
First(E′) = + ℇ [it contains ℇ]
(i)F( E ) [ B ]
Follow(T)=Follow(E′)
FOLLOW(E)= FIRST( ) )
=) $
= )
(iii) TFT′ [A  B ]
(ii) T′* F T′ [  B  ]
Follow(T′)=follow(T)
FOLLOW(F)= FIRST(T′ ) except ℇ
= + ) $
= *
(iv) T′* F T′ [A  B  where first() contains ℇ]
(iii) E′+TE′ First(T′) = * ℇ [it contains ℇ],
FOLLOW(T)= FIRST(E′ ) except ℇ FOLLOW(F)= Follow (T′)
= + =+ ) $
Nonterminals First Follow Construction of Parsing table
E ( id ) $
1.For each terminal a in First(), add A to
E′ + ℇ ) $
M(A,a)
T ( id + ) $
T′ * ℇ + ) $ 2.If ℇ is in first(), then add the production A to
F ( id * + ) $ M(A,b) for each terminal b in follow(A)
3.If ℇ is in first() and $ in follow(A), add A to
M[A,$]
4. All other entries are defined as errors
id + * ( ) $
E ETE′ ETE′
E′ E′+TE′ E′ ℇ E′ℇ
T TFT′ TFT′
T′ T′ℇ T′*FT′ T′ℇ T′ℇ
F Fid FE
Note:
 If the given grammar G is left recursive or ambiguous, then M may
have atleast one multiply defined entry

 When the parsing table has, multiply defined entries, then eliminate
recursion and then left factoring wherever possible.

 A grammar whose parsing table has no multiply defined entries is said


to be LL(1)
UNIT II

CHAPTER 6

AUTOMATIC CONSTRUCTION OF EFFICIENT PARSERS


INTRODUCTION:
 LR Parsers are called so because, they scan the input from left to
right and construct a right most derivation in reverse.
LR parsers are attractive because of the following reasons.
 LR parsers recognize all programming language constructs

 LR parsing method is more general than operator precedence or any


other parser

 LR Parsers dominates the common forms of top down parsing


without backtrack.

 LR parsers can detect syntactic errors as soon as possible


Generating an LR Parser
Different techniques for producing LR Parsing tables

LR Parsers

Simple LR Parser Canonical LR Parser Look Ahead LR Parser


(SLR) (CLR) (LALR)

1. Intermediate in power
1. Most Powerful 2. Works on all class of
1. Easier to implement Grammar
2. Will work on large class of
2. Fail to produce table for 3. Can be implemented
grammar
certain grammars efficiently
3. Very expensive
6.1 LR PARSERS
LR Parser has an input, a stack and a parsing table
Input is read from left to right, one symbol at a time
Stack contains string of the form S0X1S1X2S2X3S3…..XmSm , Sm is on
the top
Each Xi is a grammar symbol
Each Si is a State symbol (used to guide the shift reduce decision)
The Parsing table contains 2 parts
 Parsing Function ACTION
 Goto function GOTO
The LR Parser program behaves as follows:
--Take the symbol on the top of stack Sm and the current input a
--See the parsing table entry for state Sm and input a
--Action(Sm,a)
--The entry Action(Sm,a) can have any one of 4 values
 Shift
 Reduce A
 Accept
 Error
The GOTO function takes a state and grammar symbol as
arguments and produces a state.
The Configuration of an LR parser is a pair,
1st component is the stack contents
2nd component is unexpended input
(s0 X1 s1 X2 s2 . . . Xm sm, ai ai+1 . . . an $)
The next move of the parser is determined by reading
ai, the current input symbol and
sm the state on top of the stack and
then consult the parsing action table entry
ACTION[sm,ai]
4 types of moves:
 If ACTION[Sm, ai]=shift s, the parser executes a shift
(s0 X1 s1 X2 s2 . . . Xm ai sm, ai+1 . . . an $)

 If ACTION[Sm, ai]= reduce A, the parser executes a


reduce
(s0 X1 s1 X2 s2 . . . Xm-r sm-r A s, ai ai+1 . . . an $)

 If ACTION[Sm, ai]= accept, the parsing is completed.

 If ACTION[Sm, ai]= error, the parser calls the error


recovery routine.
Construction of SLR Parsing table:
Consider the augmented grammar
E′E
EE+T
ET
TT*F
TF
F(E)
Fid

I = {E′E}
Closure(I)= E′.E Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F I0
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id
E′.E
E.E+T
From I0, look for items with E immediately to the right of dot.
E.T
Goto(I0, E)= {E′E. T.T*F  I0
EE.+T}  T.F
F.(E)
F.id
From I0, look for items with T immediately to the right of dot.
Goto(I0, T)= {ET.
TT.*F }

From I0, look for items with F immediately to the right of dot.
Goto(I0, F)= { TF.} I3
From I0, look for items with ( immediately to the right of dot.
Goto(I0, ( )={ F (.E) Contains .E , So add E Productions with . at the left
end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F
T.F Contains .F , So add F Productions with . at the left end
F.(E) E′.E
F.id }  I4 E.E+T
E.T
T.T*F  I0
T.F
F.(E)
From I0, look for items with id immediately to the right of dot. F.id
Goto(I0, id )={ Fid.} I5
From I1, look for items with + immediately to the
right of dot. {E I1 I1
EE.+T′E. }
Goto(I1, + )={EE+.T
T.T*F
T.F { ET.
TT.*F }  I2
F.(E)
F.id } I6
From I2, look for items with * immediately to the right of dot.
Goto(I2, *)={ TT*.F
{ TF.}  I3
F.(E)
F.id } I7
Goto(I3, null)
Goto(I4, E)={F(E.)
EE.+T}  I8 F (.E)
Goto(I4, T)= E.E+T
{ET.
TT.*F } I2 E.T
Goto(I4, F)= T.T*F
{ TF.} I3
T.F
Goto(I4, ( )={ F (.E) F.(E)
E.E+T
F.id  I4
E.T
T.T*F
T.F
F.(E) Goto(I4, id )={Fid.} I5
F.id }  I4
Goto(I6, T)=
{EE+.T
{EE+T.
T.T*F
TT.*F } I9
T.F
Goto(I6, F)= { TF.} I3
F.(E)
Goto(I6, ( )={ F (.E)
F.id } I6
E.E+T
E.T
T.T*F
T.F Goto(I6, id )={Fid.} I5
F.(E)
F.id }  I4
TT*.F
Goto(I7, ( )={ F (.E)
F.(E) F(E.)
E.E+T EE.+T}  I8
F.id } I7
E.T Goto(I8, + )={EE+.T
T.T*F T.T*F
Goto(I9, *)={ TT*.F
T.F T.F
F.(E)
F.(E) F.(E)
F.id } I7
F.id }  I4 F.id } I6

Goto(I7, id )={Fid.} I5 Goto(I8, ) )={F(E).} I11

Goto(I7, F)= { TT*F.} I10 {EE+T.


TT.*F } I9
Find follow of nonterminals
1. E′E 5. ET
Follow(E)=follow(E′) Follow(T)= Follow(E)
E′ is the start symbol. So follow(E′)=$
=+ ) $
This  follow(E)=$

2. F(E) 6. TF
Follow(E)=) Follow(F)= Follow(T)
= * + ) $
3. EE+T
Follow(E)=+ NT follow
E ) + $
T * ) + $
4. TT*F F * ) + $
Follow(T)=*
Reduce: 4. TF. is in I3 r4
SLR Parsing table:
[ I3 follow symbols = r4]
1. EE+T. in I9 r1
Action( 3, * ) = r4
[ I9, follow symbols = r1]
Accept: Action( 3, + ) = r4
Action( 9, ) ) = r1
Action( 3, ) ) = r4
If E′E. is in Ii, then Action( 9, + ) = r1
Action( 3, $ ) = r4
accept Action( 9, $ ) = r1
5. F(E). is in I11 r5
E′E. is in I1. 2. ET. is in I2 r2
[ I11, follow symbols = r5]
Hence (1,$)= accept [ I2, follow symbols = r2]
Action( 11, * ) = r5
Action( 2, + ) = r2
Action( 11, + ) = r5
Action( 2, ) ) = r2
Action( 11, ) ) = r5
Action( 2, $ ) = r2
Action( 11, $ ) = r5
3. TT*F. is in I10 r3
6. Fid. is in I5 r6
[ I10, follow symbols = r3]
Action( 10, * ) = r3 [ I11, follow symbols = r5]
Action( 10, + ) = r3 Action( 5, * ) = r5
Action( 10, ) ) = r3, Action( 5, + ) = r5
Action( 5, ) ) = r5
Action( 10, $ ) = r3
Action( 5, $ ) = r5
Shift:
Refer the terminals
( I0, ( ) gives I4
(r,c) : 0,(s4

( I0, id ) gives I5
0,ids5

( I1, + ) gives I6
1,+s6
( I0, * ) gives I7
2,*s7( shift to be included)
goto pending
State Action Goto
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 accep
t
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
LR grammar:
A grammar for which every entry is uniquely defined in parsing table is
called LR grammar
LR(0) item
LR(0) item of a grammar G is production of grammar with a dot at some
position on the right side.
A.XYZ
6.3 CONSTRUCTION OF SLR PARSING TABLE

Algorithm: 6.1
Input: C, the Canonical collection of set of items for augmented grammar G′
Output: LR parsing table with Action and go to
Method:
Let C={I0,I,I2, …. In}
The parsing actions of the state i are as follows
1. If [A.a] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A.] is in Ii, then set ACTION[I, a]= reduce A for all a in
FOLLOW(A)
3. If [S′S.] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S]
6.4 Constructing canonical LR Parsing tables
Each state contains an extra item.
A terminal symbol is included as second component
Consider the
{A . , a} is the general form of an item, augmented
where A is a production, grammar
a is a terminal symbol or $ sign S′S
SCC
It is called as LR(1) item, where 1 refers to the CcC | d
length of the 2nd component called
lookahead of the item. Itemset I={
S′S}
Find the closure of
{ S′S, $}
Closure(I)
Match the item S′ℇ .S ℇ , $ with A . B , a
A  B  a

This closure tells us to add {B.γ,b} for each production Bγ and
a terminal b in First(a)
Here Bγ is, SCC, first(a)={ℇ $}=$
S.CC is added.

Add all the items [ C.γ,b] for b in first(c$)


S .CC, $
A .B a
First(a)=first(C$)=first(C)=c,d
I0=S′.S, $ Goto (I2, d) = {Cd. , $ } I7
S.CC, $
C.cC, c / d Goto (I3, C) = {CcC., c/d} I8
C.d, c / d
Goto((I3, c) = {Cc.C, c/d
Goto (I0, S) = {S′S. , $ } I1
Goto (I0, C) = {SC.C, $ C.cC, c/d
C.cC, $ C.d, c/d} I3
C.d, $ } I2
Goto (I0, c) = {Cc.C, c/d Goto (I3, d) = {Cd. , c/d } I4
C.cC, c/d Goto(i4,null) goto(I5,null)

C.d, c/d } I3 Goto (I6, C) = {CcC. , $ } I9


Goto (I0, d) = { Cd. , c/d } I4
Goto (I1, Null) Goto((I6, c) = {Cc.C, $
Goto (I2, C) = {SCC., $} I5 C.cC, $
Goto((I2, c) = {Cc.C, $ C.d, $ } I6
C.cC, $ Goto (I6, d) = Cd. , $ } I7
C.d, $ } I6 Goto (I7,null) Goto(i8,null) Goto(i9,null)
Accept: Refer the terminals: 3,cI3s3
S′S. is available in I1. 0,cI3s3 3,dI4s4
So I1,$= accept 0,dI4s4 6,cI6s6
Reduce: 2,cI6s6 6,dI7s7
2,dI7s7
1) SCC. is in I5. So reduce it to r1 Refer the nonterminals:
I0,SI1 I0,CI2
Action (I5, $) = r1 I2,CI5 I3,CI8 I6,CI9
2) CcC. is in I8 and I9 .Reduce it to r2.
State Action Goto
Action (I8, c) = r2 c d $ S C
0 S3 S4 1 2
Action (I8, d) = r2 1 Accept
Action (I9, $) = r2 2 S6 S7 5
3 S3 S4 8
3) Cd. is in I4. So reduce it to r3 4 r3 r3

Action (I4, c) = r3 5 r1
6 S6 S7 9
Action (I4, d) = r3 7 r3
Action (I4, $) = r3 8 r2 r2
9 r2
GOTO graph:
6.3 CONSTRUCTION OF CLR PARSING TABLE

Algorithm: 6.3
Refer book pg no 219

CONSTRUCTION OF CLR PARSING TABLE


Algorithm 6.4
Refer Pg No. 222
6.5)Construction of LALR parsing table: 2) States 4 and 7 are same. They only differ in
the 2nd item
 Tables obtained by LALR is smaller than CLR Hence they can be combined
 Similar Syntactic constructs are grouped I47 : = Cd. , c/d/$
 SLR and LALR parsing tables have same 3) States 8 and 9 are same. They only differ in
number of states the 2nd item
Consider the grammar Hence they can be combined
S′S I89 : = CcC. , c/d/$
SCC State Action Goto
CcC | d c d $ S C
0 S36 S47 1 2
1) States 3 and 6 are same. They only differ in the
1 Accep
nd
2 item
t
Hence they can be combined 2 S S 5
36 47
I36 : = Cc.C, c/d/$ 36 S36 S47 89
47 r3 r3
C.cC, c/d/$ 5 r1
C.d, c/d/$ 89 r2 r2 r2
6.6 Using Ambiguous grammar
Goto(I4, E)= {EE+E.
Consider the ambiguous Goto(I1, +)= { EE+.E
E.E+E EE.+E
grammar for the expression E.E*E
E. (E) EE.*E } I7
E E+E | E*E | (E) | id E.id }I4 Goto(I4, ()= { E (. E)
E′E Goto(I1, *)= { EE*.E E.E+E
E.E+E E.E*E
I = {E′E} E.E*E
Closure(I)= E′.E E. (E) E. (E)
E.E+E E.id }I5 E.id }  I2
E.E*E  I0 Goto(I4, id )= {Eid . }  I3
E. (E) Goto(I2, E)= { E (E.)
E.id E E. +E Goto(I5, E)= { EE*E.
E E. *E}I6 EE.+E
Goto(I0, E)= { E′E.
EE.+E Goto(I2, ( )= { E (.E) EE.*E }  I8
EE.*E } I1 E.E+E Goto(I5, ( )= {E (. E)
E.E*E
Goto(I0, ( )={ E (. E) E. (E) E.E+E
E.E+E E.id }  I2 E.E*E
E.E*E
E. (E) Goto(I2, id )={Eid .}I3 E. (E)
E.id }  I2 E.id }  I2
Goto(I3, null) Goto(I5,id )= {Eid . }  I3
Goto(I0, id )={ E id. }  I3
Goto(I6, ))= { E (E) . } I9 E (E.)
E E. +E
Goto(I6, +)= { E E +.E E E. *E }  I6 EE*E.
EE.+E
E.E+E Goto(I8, +)= {EE+.EEE.*E }  I8
E.E*E E.E+E
E. (E) E.E*E
E.id } I4 E. (E)
Goto(I6, *)= { E E* . E
E.id } I4
E.E+E
E.E*E EE+E.
E. (E),E.id} I5 EE.+E Goto(I8, *)= { E E* . E
Goto(I7, +)= { EE+.E EE.*E } I7 E.E+E
E.E+E E.E*E
E.E*E E. (E)
E. (E)
E.id } I5
E.id } I4
Goto(I7, *)= { E E* . E
E.E+E, E.E*E Goto(i9,null)
E. (E), E.id } I5
Find follow of nonterminals

1. E′E
Follow(E)=follow(E′)
E′ is the start symbol. So
follow(E′)=$
This  follow(E)=$
2. EE+E
Follow(E)=+
3. EE*E
Follow(E)=*
4. E(E) NT follow
E ) + *
Follow(E)= )
$
Accept:
If E′E. is in Ii, then accept 3. E(E). is in I9 r3
[ I9, follow symbols = r3]
E′E. is in I1.
Hence (1,$)= accept
Action( 9, ) ) = r3
Action( 9, + ) = r3
Reduce: Action( 9, * ) = r3
1. EE+E. in I7 r1 Action( 9, $ ) = r3
[ I7, follow symbols = r1]
Action( 7, ) ) = r1 4. Eid. Is in I3 r4
Action( 7, + ) = r1 [ I3, follow symbols = r3]
Action( 7, * ) = r1 Action( 3, ) ) = r4
Action( 3, + ) = r4
Action( 7, $ ) = r1
Action( 3, * ) = r4
2. EE*E. in I8 r2
Action( 3, $ ) = r4
[ I8, follow symbols = r2]
Action( 8, ) ) = r2
Action( 8, + ) = r2
Action( 8, * ) = r2
Action( 8, $ ) = r2
Shift:Refer the terminals ( I5, ( ) gives I2
( I0, ( ) gives I2 (r,c) : 5,(s2
(r,c) : 0,(s2 ( I5, id ) gives I3 ( I8, + ) gives I4
( I0, id ) gives I3 (r,c) : 5,id s3 (r,c) : 8,+s4
(r,c) : 0,id s3
( I6, + ) gives I4 ( I8, * ) gives I5
( I1, + ) gives I4
(r,c) : 6,+s4 (r,c) : 8,* s5
(r,c) : 1,+s4
( I1, * ) gives I5 ( I6, * ) gives I5
(r,c) : 1,* s5 (r,c) : 6,* s5
( I2, ( ) gives I2 ( I6, ) ) gives I4
(r,c) : 2,(s2 (r,c) : 6,)s9
( I2, id ) gives I3 ( I7, + ) gives I4
(r,c) : 2,id s3 (r,c) : 7,+s4
( I4, ( ) gives I2 ( I7, * ) gives I5
(r,c) : 4,(s2 (r,c) : 7,* s5
( I4, id ) gives I3
(r,c) : 4,id s3
State Action Goto Assuming + is left associative,
the action of state 7 on input +
id + * ( ) $ E should be to reduce EE+E
0 s3 s2 1
1 S4 S5 accept Assuming * takes precedence
2 s3 s2 over +, the action of state 7 on
3 r4 r4 r4 r4
input * should be to shift
4 s3 s2 8
5 s3 s2
6 s4 s5 s9 Similarly, assuming that * is left
7 r1/s4 s5/r1 r1 r1 associative and takes precedence
over +, we can say that,
8 r2/s4 r2/s5 r2 r2 The action of state 8 on both the
9 r3 r3 r3 r3
inputs + and * should be to
reduce EE*E.
(in case of input+, the reason is
* takes precedence over +)

(in case of input *, the reason is


* is left associative)

You might also like