Professional Documents
Culture Documents
CHAPTER 5
BASIC PARSING TECHNIQUES
5.1 Parser
A parser (for grammar G) is a program which takes a input
string w and produces a parse tree as output if w is a sentence
of G. otherwise it produces an error
Types of parser: Build parse trees from bottom to top ( leaves to root )
• Bottom up parser ( Shift Reduce parsing, Operator Precedence Parsing)
In both the cases , the parser is scanned from left to right, one
symbol at a time
Shift Reduce Parsing:
Parse tree T
Construct a LMD for the sentence w= i b t i b t a e a
LMD:
SiCtS
ibtS RMD:
ibtiCtSeS
ibtibtSeS SiCtS
iCtiCtSeS
ibtibtaeS
iCtiCtSea
ibtibtaea iCtiCtaea
iCtibtaea
ibtibtaea
Constructing a LMD
5.2 Shift reduce Parsing
Shift reduce parsing is an example of Bottom up Parsing
Reduction :
• Look for the substring that match the right side of some
production
• Replace it by a symbol on the left
• {Replacement of right side of a production by its left side
is called Reduction}
Consider the grammar
SaAcBe
AAb|b
Bd
and the string abbcde
We want to reduce the string to S
Handle Pruning:
• Removing the handle by replacing the left side of the
production
• RMD in reverse is obtained by handle pruning
• The string appearing to the right side of a handle contains
only terminals
Consider the grammar
EE+E, EE*E, E(E), Eid
Consider the RMD
EE+E
E+E*E
E+E*id3
E+id2*id3
id1+id2*id3
id1handle
Input: id+id*id
Consider the grammar EE+E, EE*E, E(E), Eid
and input string id1+id2*id3
consider the sequence of reductions that leads to the start symbol E
E+id2*id3
E+E*id3
E+E*E
E+E
E
• At the end , the start symbol will have the entire parse tree
associated with it
Parse tree construction:
After reducing id1+id2*id3 to E+E
After shifting id1
+ <. .> <. .> 2. Then Scan backwards to the left until we
* <. .> > .> encounter <. sign
$ <. <. <. err 3. handle contains every thing to the left of .> and
to the right of <.
$id+id*id$
The string with the precedence relations inserted is :
$<.id.>+<.id.>*<id.>$
Insert precedence relation
In this eg, 1st handle is <id> reduce it $<. +<. * .>$
to E This indicates that, the left end of the
$E+id*id$ handle lies between + and * and the
$E+E*id$ right end of the handle lies between *
$E+E*E$ and $
Now delete the nonterminals ie. in E+E*E, the handle is E*E
$+*$ is obtained
Obtain operator precedence relation for
EE+E|E-E|E*E|E/E|E^E|(E)|-E|id
+ * ( ) id $
+ .> <. <. .> <. .>
* .> .> <. .> <. .>
( <. <. <. ≐ <. Err
) .> .> Err .> Err .>
id .> .> Err .> Err .>
$ <. <. <. err <. Err
Algorithm 5.1 Computation of LEADING Main procedure
Input : CFG begin
Output: Boolean array L[A,a] in which /* Initialize L */
the entry is true, if a is in LEADING(A) for each nonterminal A and terminal a
do L[A,a]:=false;
for each production of the form
Procedure INSTALL(A,a) Aa or ABa do
If not L[A,a] then INSTALL(A,a)
begin While STACK not empty do
L[A,a]:=true; begin
pop top pair(B , a) from STACK;
Push(A,a) onto STACK
for each production of the form
End AB do
INSTALL(A,a)
end
end
Algorithm 5.2
Calculation of Operator Precedence relation
Input : operator grammar G
Output: relations <. , .> , ≐ for G For each production AX1X2….Xn do
Method: For i:=1 to n-1 do
Compute LEADING(A) and TRAILING(A)
Begin
for each nonterminal A If Xi and Xi+1 are both terminals, then set Xi=Xi+1
If i<=n-2 and Xi and Xi+2 are terminals,
Examine the position of right side of each And Xi+1 is a nonterminal then
production Set Xi=Xi+2;
If Xi is a terminal and Xi+1 is a nonterminal then
For all a in LEADING(Xi+1) do set Xi < a;
Set $<.a for all a in LEADING(S) and set If Xi is a nonterminal and Xi+1 is a terminal then
b>$ for all b in TRAILING(S), where S is
the start symbol For all a in TRAILING(Xi) do set a> Xi +1
End
operator precedence parsing algorithm pg 171
repeat forever
if only $ is on the stack, and only $ is on the input then
accept and break
else
begin
let a be the top most terminal symbol on the stack
and let b be the current input symbol
if a<.b or a≐b then shift b on to the stack
else if a.>b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by <
to the terminal most recently popped
else call the error correcting routine
end
Fig 5.14 Action of operator precedence parsing
Precedence function
The table can be encoded by 2 precedence functions f and g
f(a) < g(b) where a<.b
f(a) > g(b) where a.>b
f(a) = g(b) where a≐b
finding precedence function for a table
1. Create symbols fa, ga for each a , that is terminal or $ sign
2. Partition the created symbols into as many groups as possible
If a≐b, then fa and gb are in the same group
If there are no cycles, f(a) be the length of the
3. Create a directed graph longest path beginning at f(a)
g(b) be the length of the longest path from
If a<.b, place an edge from gbfa the group g(b)
If a.>b, place an edge from fagb
4.If the graph constructed has cycles, then no precedence functions exist
If the graph constructed has no cycles, then precedence functions exist
g Graph for preceding function
id + * $
. .> .>
f id err >
+ <. .> <. >
* <. .> > .>
$ <. <. <. err
Consider the matrix
No = relationship
So each symbol is in separate group
No cycles
Therefore, Precedence function exists
f($)=0
g($)=0 id + * $
g(+)=1 f 4 2 4 0
gidf*g*f+g+f$ g 5 1 3 0
g(id)=5
Top down Parsing
The left most leaf c, matches the 1st symbol of w.
• Top down parsing involves backtracking.
So advance the input pointer to a.
• It scans the input repeatedly
Consider the next leaf A
Consider the grammar
ScAd Expand A using the 1st alternative and obtain the tr
Aab/a S
Input:
w=cad c A d
To construct a parse tree :
Initially construct a tree consisting of single node labelled S a b
The input pointer points to c We now have a match for 2nd input symbol.a
Use the first production of S to expand the tree and
Consider the next input symbol d and the next
obtain Parse tree
leaf b
S
b does not match with d
so, report failure
c A d
Go back and see is there any alternative for A
while going back, reset the input pointer to position 2 (the place we
had, when we came to A)
now we have to with the 2nd alternative for A
Now the leaf a matches the second symbol of w and the leaf d
matches the 3rd symbol.
Thus parse tree for w is produced.
S
c A d
a
Procedure A( ); Aab/a
Recursive procedure for top down
parsing begin
Procedure S( ); ScAd isave:=input-pointer;
begin if input symbol = ‘a’ then
If input symbol=’c’ then begin
begin ADVANCE( );
ADVANCE(); If input symbol =’b’ then
if A( ) then begin ADVANCE ( ); return true end
if input symbol = ‘d’ then end
begin ADVANCE ( ); input-pointer:=isave; /* failure to find ab */
return true end If input symbol= ‘a’ then
end begin ADVANCE ( ); return true end
return false else return false
End (a) Procedure S End (b) Procedure A
Difficulties in top down parsing: Backtracking:
Left Recursion If we make sequence of expansions and
Back tracking subsequently discover a mismatch, we
Left factoring have to undo the semantic effects of all
Left recursion : erroneous expansion.
A grammar is said to be left recursive, if
it has a nonterminal A such that AA Entries made in the symbol table have to
Left recursion causes infinite loop be removed.
AA/ Left factoring:
So we must eliminate all left recursive The order in which the alternatives
grammar are tried can affect the language
It will make the parser into an infinite
loop
Eliminating the left recursion
Consider
AA/ (β does not begin with A)
The left recursion can be eliminated with the pair of productions
AA’
Equivalent Parse trees:
A′A′|ℇ
AA′
A′A′ | ℇ
AA |
Consider the grammar
EE+T / T, TT*F / F, F (E) /id
Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
This is example for non backtracking recursive descendent parser
Mutually recursive procedure to recognize arithmetic procedure TPRIME( ); T’+FT’
expressions: if input-symbol=’*’ then
procedure E( ); ETE’ begin
begin ADVANCE( );
T( ); F( );
EPRIME( )
TPRIME( )
end;
end;
procedure EPRIME( ); E’+TE’ procedure F( );Fid | (E)
begin ADVANCE( )
else if input-symbol=’(‘ then
ADVANCE( );
begin
T( );
ADVANCE( );
EPRIME( )
E( );
end;
If input-symbol= ‘)‘ then
procedure T( );TFT’ ADVANCE( );
begin else
F( ); ERROR( )
TPRIME( ) end
For E:
For E′ For E
For T
For F:
For F:
5.5 Predictive Parsers:
Rule2: AB [follow(B)=first() except ℇ] (ii)E′ +TE′ [A B where first() contains ℇ]
First(E′) = + ℇ [it contains ℇ]
(i)F( E ) [ B ]
Follow(T)=Follow(E′)
FOLLOW(E)= FIRST( ) )
=) $
= )
(iii) TFT′ [A B ]
(ii) T′* F T′ [ B ]
Follow(T′)=follow(T)
FOLLOW(F)= FIRST(T′ ) except ℇ
= + ) $
= *
(iv) T′* F T′ [A B where first() contains ℇ]
(iii) E′+TE′ First(T′) = * ℇ [it contains ℇ],
FOLLOW(T)= FIRST(E′ ) except ℇ FOLLOW(F)= Follow (T′)
= + =+ ) $
Nonterminals First Follow Construction of Parsing table
E ( id ) $
1.For each terminal a in First(), add A to
E′ + ℇ ) $
M(A,a)
T ( id + ) $
T′ * ℇ + ) $ 2.If ℇ is in first(), then add the production A to
F ( id * + ) $ M(A,b) for each terminal b in follow(A)
3.If ℇ is in first() and $ in follow(A), add A to
M[A,$]
4. All other entries are defined as errors
id + * ( ) $
E ETE′ ETE′
E′ E′+TE′ E′ ℇ E′ℇ
T TFT′ TFT′
T′ T′ℇ T′*FT′ T′ℇ T′ℇ
F Fid FE
Note:
If the given grammar G is left recursive or ambiguous, then M may
have atleast one multiply defined entry
When the parsing table has, multiply defined entries, then eliminate
recursion and then left factoring wherever possible.
CHAPTER 6
LR Parsers
1. Intermediate in power
1. Most Powerful 2. Works on all class of
1. Easier to implement Grammar
2. Will work on large class of
2. Fail to produce table for 3. Can be implemented
grammar
certain grammars efficiently
3. Very expensive
6.1 LR PARSERS
LR Parser has an input, a stack and a parsing table
Input is read from left to right, one symbol at a time
Stack contains string of the form S0X1S1X2S2X3S3…..XmSm , Sm is on
the top
Each Xi is a grammar symbol
Each Si is a State symbol (used to guide the shift reduce decision)
The Parsing table contains 2 parts
Parsing Function ACTION
Goto function GOTO
The LR Parser program behaves as follows:
--Take the symbol on the top of stack Sm and the current input a
--See the parsing table entry for state Sm and input a
--Action(Sm,a)
--The entry Action(Sm,a) can have any one of 4 values
Shift
Reduce A
Accept
Error
The GOTO function takes a state and grammar symbol as
arguments and produces a state.
The Configuration of an LR parser is a pair,
1st component is the stack contents
2nd component is unexpended input
(s0 X1 s1 X2 s2 . . . Xm sm, ai ai+1 . . . an $)
The next move of the parser is determined by reading
ai, the current input symbol and
sm the state on top of the stack and
then consult the parsing action table entry
ACTION[sm,ai]
4 types of moves:
If ACTION[Sm, ai]=shift s, the parser executes a shift
(s0 X1 s1 X2 s2 . . . Xm ai sm, ai+1 . . . an $)
I = {E′E}
Closure(I)= E′.E Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F I0
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id
E′.E
E.E+T
From I0, look for items with E immediately to the right of dot.
E.T
Goto(I0, E)= {E′E. T.T*F I0
EE.+T} T.F
F.(E)
F.id
From I0, look for items with T immediately to the right of dot.
Goto(I0, T)= {ET.
TT.*F }
From I0, look for items with F immediately to the right of dot.
Goto(I0, F)= { TF.} I3
From I0, look for items with ( immediately to the right of dot.
Goto(I0, ( )={ F (.E) Contains .E , So add E Productions with . at the left
end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F
T.F Contains .F , So add F Productions with . at the left end
F.(E) E′.E
F.id } I4 E.E+T
E.T
T.T*F I0
T.F
F.(E)
From I0, look for items with id immediately to the right of dot. F.id
Goto(I0, id )={ Fid.} I5
From I1, look for items with + immediately to the
right of dot. {E I1 I1
EE.+T′E. }
Goto(I1, + )={EE+.T
T.T*F
T.F { ET.
TT.*F } I2
F.(E)
F.id } I6
From I2, look for items with * immediately to the right of dot.
Goto(I2, *)={ TT*.F
{ TF.} I3
F.(E)
F.id } I7
Goto(I3, null)
Goto(I4, E)={F(E.)
EE.+T} I8 F (.E)
Goto(I4, T)= E.E+T
{ET.
TT.*F } I2 E.T
Goto(I4, F)= T.T*F
{ TF.} I3
T.F
Goto(I4, ( )={ F (.E) F.(E)
E.E+T
F.id I4
E.T
T.T*F
T.F
F.(E) Goto(I4, id )={Fid.} I5
F.id } I4
Goto(I6, T)=
{EE+.T
{EE+T.
T.T*F
TT.*F } I9
T.F
Goto(I6, F)= { TF.} I3
F.(E)
Goto(I6, ( )={ F (.E)
F.id } I6
E.E+T
E.T
T.T*F
T.F Goto(I6, id )={Fid.} I5
F.(E)
F.id } I4
TT*.F
Goto(I7, ( )={ F (.E)
F.(E) F(E.)
E.E+T EE.+T} I8
F.id } I7
E.T Goto(I8, + )={EE+.T
T.T*F T.T*F
Goto(I9, *)={ TT*.F
T.F T.F
F.(E)
F.(E) F.(E)
F.id } I7
F.id } I4 F.id } I6
2. F(E) 6. TF
Follow(E)=) Follow(F)= Follow(T)
= * + ) $
3. EE+T
Follow(E)=+ NT follow
E ) + $
T * ) + $
4. TT*F F * ) + $
Follow(T)=*
Reduce: 4. TF. is in I3 r4
SLR Parsing table:
[ I3 follow symbols = r4]
1. EE+T. in I9 r1
Action( 3, * ) = r4
[ I9, follow symbols = r1]
Accept: Action( 3, + ) = r4
Action( 9, ) ) = r1
Action( 3, ) ) = r4
If E′E. is in Ii, then Action( 9, + ) = r1
Action( 3, $ ) = r4
accept Action( 9, $ ) = r1
5. F(E). is in I11 r5
E′E. is in I1. 2. ET. is in I2 r2
[ I11, follow symbols = r5]
Hence (1,$)= accept [ I2, follow symbols = r2]
Action( 11, * ) = r5
Action( 2, + ) = r2
Action( 11, + ) = r5
Action( 2, ) ) = r2
Action( 11, ) ) = r5
Action( 2, $ ) = r2
Action( 11, $ ) = r5
3. TT*F. is in I10 r3
6. Fid. is in I5 r6
[ I10, follow symbols = r3]
Action( 10, * ) = r3 [ I11, follow symbols = r5]
Action( 10, + ) = r3 Action( 5, * ) = r5
Action( 10, ) ) = r3, Action( 5, + ) = r5
Action( 5, ) ) = r5
Action( 10, $ ) = r3
Action( 5, $ ) = r5
Shift:
Refer the terminals
( I0, ( ) gives I4
(r,c) : 0,(s4
( I0, id ) gives I5
0,ids5
( I1, + ) gives I6
1,+s6
( I0, * ) gives I7
2,*s7( shift to be included)
goto pending
State Action Goto
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 accep
t
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
LR grammar:
A grammar for which every entry is uniquely defined in parsing table is
called LR grammar
LR(0) item
LR(0) item of a grammar G is production of grammar with a dot at some
position on the right side.
A.XYZ
6.3 CONSTRUCTION OF SLR PARSING TABLE
Algorithm: 6.1
Input: C, the Canonical collection of set of items for augmented grammar G′
Output: LR parsing table with Action and go to
Method:
Let C={I0,I,I2, …. In}
The parsing actions of the state i are as follows
1. If [A.a] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A.] is in Ii, then set ACTION[I, a]= reduce A for all a in
FOLLOW(A)
3. If [S′S.] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S]
6.4 Constructing canonical LR Parsing tables
Each state contains an extra item.
A terminal symbol is included as second component
Consider the
{A . , a} is the general form of an item, augmented
where A is a production, grammar
a is a terminal symbol or $ sign S′S
SCC
It is called as LR(1) item, where 1 refers to the CcC | d
length of the 2nd component called
lookahead of the item. Itemset I={
S′S}
Find the closure of
{ S′S, $}
Closure(I)
Match the item S′ℇ .S ℇ , $ with A . B , a
A B a
This closure tells us to add {B.γ,b} for each production Bγ and
a terminal b in First(a)
Here Bγ is, SCC, first(a)={ℇ $}=$
S.CC is added.
Action (I4, c) = r3 5 r1
6 S6 S7 9
Action (I4, d) = r3 7 r3
Action (I4, $) = r3 8 r2 r2
9 r2
GOTO graph:
6.3 CONSTRUCTION OF CLR PARSING TABLE
Algorithm: 6.3
Refer book pg no 219
1. E′E
Follow(E)=follow(E′)
E′ is the start symbol. So
follow(E′)=$
This follow(E)=$
2. EE+E
Follow(E)=+
3. EE*E
Follow(E)=*
4. E(E) NT follow
E ) + *
Follow(E)= )
$
Accept:
If E′E. is in Ii, then accept 3. E(E). is in I9 r3
[ I9, follow symbols = r3]
E′E. is in I1.
Hence (1,$)= accept
Action( 9, ) ) = r3
Action( 9, + ) = r3
Reduce: Action( 9, * ) = r3
1. EE+E. in I7 r1 Action( 9, $ ) = r3
[ I7, follow symbols = r1]
Action( 7, ) ) = r1 4. Eid. Is in I3 r4
Action( 7, + ) = r1 [ I3, follow symbols = r3]
Action( 7, * ) = r1 Action( 3, ) ) = r4
Action( 3, + ) = r4
Action( 7, $ ) = r1
Action( 3, * ) = r4
2. EE*E. in I8 r2
Action( 3, $ ) = r4
[ I8, follow symbols = r2]
Action( 8, ) ) = r2
Action( 8, + ) = r2
Action( 8, * ) = r2
Action( 8, $ ) = r2
Shift:Refer the terminals ( I5, ( ) gives I2
( I0, ( ) gives I2 (r,c) : 5,(s2
(r,c) : 0,(s2 ( I5, id ) gives I3 ( I8, + ) gives I4
( I0, id ) gives I3 (r,c) : 5,id s3 (r,c) : 8,+s4
(r,c) : 0,id s3
( I6, + ) gives I4 ( I8, * ) gives I5
( I1, + ) gives I4
(r,c) : 6,+s4 (r,c) : 8,* s5
(r,c) : 1,+s4
( I1, * ) gives I5 ( I6, * ) gives I5
(r,c) : 1,* s5 (r,c) : 6,* s5
( I2, ( ) gives I2 ( I6, ) ) gives I4
(r,c) : 2,(s2 (r,c) : 6,)s9
( I2, id ) gives I3 ( I7, + ) gives I4
(r,c) : 2,id s3 (r,c) : 7,+s4
( I4, ( ) gives I2 ( I7, * ) gives I5
(r,c) : 4,(s2 (r,c) : 7,* s5
( I4, id ) gives I3
(r,c) : 4,id s3
State Action Goto Assuming + is left associative,
the action of state 7 on input +
id + * ( ) $ E should be to reduce EE+E
0 s3 s2 1
1 S4 S5 accept Assuming * takes precedence
2 s3 s2 over +, the action of state 7 on
3 r4 r4 r4 r4
input * should be to shift
4 s3 s2 8
5 s3 s2
6 s4 s5 s9 Similarly, assuming that * is left
7 r1/s4 s5/r1 r1 r1 associative and takes precedence
over +, we can say that,
8 r2/s4 r2/s5 r2 r2 The action of state 8 on both the
9 r3 r3 r3 r3
inputs + and * should be to
reduce EE*E.
(in case of input+, the reason is
* takes precedence over +)