Professional Documents
Culture Documents
CHAPTER 5
BASIC PARSING TECHNIQUES
5.1 Parser
A parser for grammar G is, a program which takes a input string w and produces a parse tree as
output if w is a sentence of G. otherwise it produces an error
Types of parser:
Bottom up parser Build parse trees from bottom to top ( leaves to root )
( Shift Reduce parsing, Operator Precedence Parsing)
Left most non terminal Right most non terminal is replaced at every step
is replaced at every
Also called as Canonical derivation
step
Parse tree T
b)
c)
d)
5.2 Shift reduce Parsing
Shift reduce parsing is an example of Bottom up Parsing
It constructs the parse trees starting from the leaves, and working up towards the root
Reduction :
Look for the substring that match the right side of some production
Replace it by a symbol on the left
{Replacement of right side of a production by its left side is called Reduction}
Consider the grammar
SaAcBe
AAb|b
Bd
and the string abbcde
We want to reduce the string to S
Given string is abbcde
From the grammar, abbcde , it is noted that
Ab : aAbcde
AAb : aAcde
Bd : aAcBe
SaAcBe : S
Handle:
It is a substring which matches the right side of the production, such that replacement of
substring by a production on the left side leads to start symbol
Handle Pruning:
Removing the handle by replacing the left side of the production
RMD in reverse is obtained by handle pruning
The string appearing to the right side of a handle contains only terminals
Consider the grammar
EE+E
EE*E
E(E)
Eid
Consider the RMD
EE+E
E+E*E
E+E*id3
E+id2*id3
id1+id2*id3
id1handle
-----
Input: id+id*id
Consider the grammar EE+E, EE*E, E(E), Eid
and input string id1+id2*id3
consider the sequence of reductions that leads to the start symbol E
E+id2*id3
E+E*id3
E+E*E
E+E
E
Sequence of right sentential form is the reverse of RMD
SR parsing Operations
A shift reduce parser does 4 operations
Shift
The next input symbol is shifted to the top of the stack
Reduce
The parser identifies the handle at the top of the stack
It compares the handle to the right side of the production
When there is a match, the handle is replaced by the left side of the production
Accept
It indicates the successful completion of parsing
Error
Indicates that some error has occurred, and calls an error recovery routine
Use a stack and a input buffer.
$ symbol is used to mark the bottom of the stack and right most end of the input
Stack bottom Input
$ w$
The parser operates by shifting 0 or more input symbols to the stack, until a handle is on
the top of stack.
The parser then reduces the handle to the left side of the production
This process is repeated until
the stack contains the start symbol and the input is empty ( or )
an error is detected
Stack bottom Input
$S $
Input string id1+id2*id3
Grammar: EE+E | E*E | -(E) | id
Shift Reduce parsing actions
Stack input Action
$ id1+id2*id3$ Shift
$id1 +id2*id3$ Reduce by Eid
$E +id2*id3$ Shift
$E+ id2*id3$ Shift
$E+ id2 *id3$ Reduce by Eid
$E+ E *id3$ Shift
$E+ E* id3$ Shift
$E+ E*id3 $ Reduce by Eid
$E+ E*E $ Reduce by EE*E
$E+ E $ Reduce by EE+E
$E $ Accept
Constructing parse tree
1. When we shift an input symbol a on to the stack, we create one node of the tree labelled
a. Both the root and the yield of the tree are a
2. When we reduce X1X2X3…..Xn, to A, we create a new node labelled A Its children are
the root of X1,X2, …Xn
3. For each symbol on the stack, associate a pointer to a tree whose root is that symbol and
children are string of terminals which have been reduced.
4. At the end , the start symbol will have the entire parse tree associated with it
Parse tree construction:
d) After completion
5.3 Operator precedence parsing:
Operator grammar properties:
No production have € on the right side
No production have 2 adjacent non terminals
Eg:
EEAE | (E) | -E | id
A + | - | *
Is not an operator grammar, because EAE is on the RHS ( 3 adjacent non terminals)
If no precedence relation holds between pair of terminals, then error recovery routine is called.
id+id*id
Consider
id + * $
.
id er > .> .>
r
+ <. .
> <. .>
* <. .
> <. .>
$ <. <. <. err
$id+id*id$
The string with the precedence relations inserted is :
$<.id.>+<.id.>*<id.>$
1. Scan the string from left end, until we encounter .> sign
2. Then Scan backwards to the left until we encounter <. sign
3. handle contains every thing to the left of .> and to the right of <.
in this eg, 1st handle is <id> reduce it to E
E+id*id
E+E*id
E+E*E
Now delete the nonterminals
$+*$ is obtained
Insert precedence relation
$<. +<. * .>$
This indicates that, the left end of the handle lies between + and * and the right end of
the handle lies between * and $
ie. in E+E*E, the handle is E*E
EE+E
This is of the form α a Aβ a:+
E+ E and EE*E, this is of the form γb b:*
.
+< *
(ii) SiCtS, Cb
i<b
3. a.>b, if a nonterminal A appears to the left of b and contain a string in which a is the last
terminal
(i) α Abβ and Aγa where may be ℇ or single nonterminal [a>b]
EE+E , this is of the form α Abβ
E+E b:+
Where EE*E, this is of the form γa a: *
*>+
(ii) SiCtS, Cb
b>t
4. $ <. b, where b is the 1st terminal
5. a .> $, where a is the last terminal
Terminal NT Terminal =
Terminal NT <
NT Terminal >
Start symbol $ < first terminal
Last terminal > $ End symbol
(1) E+
.
+ * ) id > +
(2) T*
* ) id .> *
(3) E)
+ * ) id .> )
According to rule4
$ must be related by <. symbol for all 1st terminals $ is start symbol
$<.E
$<.*, $<.+, $<. (, $<.id
According to rule 5,
$ must be related by > symbol for all last terminals $ is end symbol
E.>$
*<$, +.>$, ) .>$, id .>$
Operator Precedence relation
+ * ( ) id $
.
+ > <. <. .
> <. .
>
.
* > .> <. .
> <. .
>
( <. <. <. ≐ <. Err
.
) > .> Er .
> Er .
>
r r
.
i > .> Er .
> Er .
>
d r r
$ <. <. <. err <. Err
Procedure INSTALL(A,a)
If not L[A,a] then
begin
L[A,a]:=true;
Push(A,a) onto STACK
End
Main procedure
begin
/* Initialize L */
for each nonterminal A and terminal a do L[A,a]:=false;
for each production of the form Aa or ABa do
INSTALL(A,a)
While STACK not empty do
begin
pop top pair(B , a) from STACK;
for each production of the form AB do
INSTALL(A,a)
end
end
Algorithm 5.2
Calculation of Operator Precedence relation
Input : operator grammar G
Output: relations <. , .> , ≐ for G
Method:
Compute LEADING(A) and TRAILING(A) for each nonterminal A
Examine the position of right side of each production
Set $<.a for all a in LEADING(S) and set b>$ for all b in TRAILING(S), where S is the start
symbol
id + * $
f 4 2 4 0
g 5 1 3 0
Top down Parsing
Top down parsing involves backtracking.
It scans the input repeatedly
Consider the grammar
ScAd
Aab/a
Input:
w=cad
To construct a parse tree for this sentence using top down,
Initially construct a tree consisting of single node labelled S
The input pointer points to c
Use the first production of S to expand the tree and obtain Parse tree
S
c A d
The left most leaf c, matches the 1st symbol of w.
So advance the input pointer to a.
Consider the next leaf A
Expand A using the 1st alternative and obtain the tree
S
c A d
a b
We now have a match for 2nd input symbol.a
Consider the next input symbol d and the next leaf b
b does not match with d
so, report failure
go back and see is there any alternative for A
while going back, reset the input pointer to position 2 (the place we had, when we came to A)
now try with the 2nd alternative for A
Now the leaf a matches the second symbol of w and the leaf d matches the 3rd symbol.
Thus parse tree for w is produced.
S
c A d
Procedure A( );
begin
isave:=input-pointer;
if input symbol = ‘a’ then
begin
ADVANCE( );
If input symbol =’b’ then
begin ADVANCE ( ); return true end
end
input-pointer:=isave;
/* failure to find ab */
If input symbol= ‘a’ then
begin ADVANCE ( ); return true end
else return false
end
(b) Procedure A
Backtracking:
If we make sequence of expansions and subsequently discover a mismatch, we have to undo the
semantic effects of all erroneous expansion. Entries made in the symbol table have to be
removed.
Left factoring:
The order in which the alternatives are tried can affect the language
Eliminating the left recursion
Consider
AA/ (β does not begin with A)
The left recursion can be eliminated with the pair of productions
AA’
A′A′|ℇ
AA′
A′A′ | ℇ
Consider the grammar
EE+T / T
TT*F / F
F (E) /id
Now we obtain
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
Eliminate immediate left recursions (productions of the form AA)
To eliminate immediate left recursion among all A productions we group the A productions as,
AA1 | A2 |….. | Am | 1 | β2 | …. |n
Where no β begins with an A. Then we replace the A productions by
A1A′ | 2A′ | . . . | βnA′
A′1A′ | 2A′ | . . . | mA′ |ℇ
Algorithm to eliminate left recursion
1. Arrange the nonterminals of G in some order A1, A2 … An
2. For i:= 1 to n do
begin
for j:=1 to i-1 do
replace each production of the form AiAjγ
by the productions Aiβ1γ | β2γ | . . . | βkγ
eliminate the immediate left recursion among the Ai productions
end
Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
This is example for non backtracking recursive descent parser
procedure F( );
if input-symbol= ‘id’ then
ADVANCE( )
else if input-symbol=’(‘ then
begin
ADVANCE( );
E( );
If input-symbol= ‘)‘ then
ADVANCE( );
else
ERROR( )
end
else ERROR( )
Transition Diagrams:
Draw one transition diagram for each nonterminal
The labels of the edges are tokens or nonterminals
A transition on a token-> means if the token is the next input symbol
Edges may be labelled by nonterminals
For each nonterminal A:
Create an initial and final state
For each production, AX1X2…Xn, create a path from the initial to final state, with edges
labelled X1, X2, … Xn.
Consider
ETE′
E′+TE′ | ℇ
TFT′
T′*FT′ | ℇ
F(E) | id
For E:
For E ′:
For T:
For T′ :
For F:
For E′:
For E′:
For E :
For E:
Similarly For T:
For F:
To fill the parsing table, we need to consider 2 functions FIRST and FOLLOW
Rules for finding FIRST
1. If X is terminal, then FIRST(x)=x
2. If X is a nonterminal, and
(i) is of the form, Xa′, FIRST(X) = a
(ii) is of the form, Xℇ, FIRST(X)=ℇ
3. If X is a nonterminal, and is of the form,XY1Y2Y3…Yk and all Y1,Y2….Yk are
nonterminals, then find FIRST(Y1).
Then add all non ℇ symbols of FIRST(Y1) to FIRST(X).
If ℇ is in FIRST(Y1), then add all non ℇ symbols of FIRST(Y2) to FIRST(X).
If ℇ is in FIRST(Y1) & FIRST(Y) , then add all non ℇ symbols of FIRST(Y 3) to
FIRST(X).
If all of FIRST(Y1) upto FIRST(Yk) contains ℇ, then add ℇ to FIRST(X).
Rules for finding FOLLOW
1. Add $ to FOLLOW(S), where S is the Start symbol
2. If there is a production AB, ≠ℇ, then FOLLOW(B)=everything in FIRST() except ℇ
3. If there is a production AB or AB, where FIRST() contains ℇ, then FOLLOW(B)=
everything in FOLLOW(A)
id + * ( ) $
E ETE′ ETE′
E E′+TE′ E′ ℇ E′$
′
T TFT′ TFT′
T T′ℇ T′*FT T′ℇ T′ℇ
′ ′
F Fid FE
CHAPTER 6
AUTOMATIC CONSTRUCTION OF EFFICIENT
PARSERS
INTRODUCTION:
LR Parsers are called so because, they scan the input from left to right and construct a
right most derivation in reverse.
LR parsers are attractive because of the following reasons.
LR parsers recognize all programming language constructs
LR parsing method is more general than operator precedence or any other parser
LR Parsers dominates the common forms of top down parsing without backtrack.
LR parsers can detect syntactic errors as soon as possible
Generating an LR Parser
6.1 LR PARSERS
LR Parser has an input, a stack and a parsing table
Input is read from left to right, one symbol at a time
Stack contains string of the form S0X1S1X2S2X3S3…..XmSm , Sm is on the top
Each Xi is a grammar symbol
Each Si is a State symbol (used to guide the shift reduce decision)
The Parsing table contains 2 parts
Parsing Function ACTION
Goto function GOTO
The LR Parser program behaves as follows:
--Take the symbol on the top of stack Sm and the current input a
--See the parsing table entry for state Sm and input a
--Action(Sm,a)
--The entry Action(Sm,a) can have any one of 4 values
Shift
Reduce A
Accept
Error
The GOTO function takes a state and grammar symbol as arguments and produces a state.
I = {E′E}
Closure(I)= E′.E Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F I0
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id
From I0, look for items with E immediately to the right of dot.
Goto(I0, E)= {E′E.
EE.+T}
From I0, look for items with T immediately to the right of dot.
Goto(I0, T)= {ET.
TT.*F }
From I0, look for items with F immediately to the right of dot.
Goto(I0, F)= { TF.} I3
From I0, look for items with ( immediately to the right of dot.
Goto(I0, ( )={ F (.E) Contains .E , So add E Productions with . at the left end
E.E+T
E.T Contains .T , So add T Productions with . at the left end
T.T*F
T.F Contains .F , So add F Productions with . at the left end
F.(E)
F.id } I4
From I0, look for items with id immediately to the right of dot.
Goto(I0, id )={ Fid.} I5
From I1, look for items with + immediately to the right of dot.
Goto(I1, + )={EE+.T
T.T*F
T.F
F.(E)
F.id } I6
From I2, look for items with * immediately to the right of dot.
Goto(I2, *)={ TT*.F
F.(E)
F.id } I7
Goto(I3, null)
Goto(I4, E)={F(E.)
EE.+T} I8
Goto(I4, T)=
{ET.
TT.*F } I2
Goto(I4, F)=
{ TF.} I3
Goto(I8, + )={EE+.T
T.T*F
T.F
F.(E)
F.id } I6
2. F(E)
Follow(E)=)
3. EE+T
Follow(E)=+
4. TT*F
Follow(T)=*
5. ET
Follow(T)= Follow(E)
=+ ) $
6. TF
Follow(F)= Follow(T)
= * + ) $
N follow
T
E ) + $
T * ) + $
F * ) + $
Reduce:
1. EE+T. in I9 r1
[ I9, follow symbols = r1]
Action( 9, ) ) = r1
Action( 9, + ) = r1
Action( 9, $ ) = r1
2. ET. is in I2 r2
[ I2, follow symbols = r2]
Action( 2, + ) = r2
Action( 2, ) ) = r2
Action( 2, $ ) = r2
4. TF. is in I3 r4
[ I3 follow symbols = r4]
Action( 3, * ) = r4
Action( 3, + ) = r4
Action( 3, ) ) = r4
Action( 3, $ ) = r4
6. Fid. is in I5 r6
[ I11, follow symbols = r5]
Action( 5, * ) = r5
Action( 5, + ) = r5
Action( 5, ) ) = r5
Action( 5, $ ) = r5
Shift:
Refer the terminals
( I0, ( ) gives I4
(r,c) : 0,(s4
( I0, id ) gives I5
0,ids5
( I1, + ) gives I6
1,+s6
( I0, * ) gives I7
2,*s7
LR grammar:
A grammar for which every entry is uniquely defined in parking table is called LR grammar
LR(0) item
LR(0) item of a grammar G is production of grammar with a dot at some position on the right
side.
A.XYZ
Algorithm: 6.2
Input: C, the Canonical collection of set of items for augmented grammar G′
Output: LR parsing table with Action and go to
Method:
Let C={I0,I,I2, …. In}
The parsing actions of the state i are as follows
1. If [A.a] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A.] is in Ii, then set ACTION[I, a]= reduce A for all a in FOLLOW(A)
3. If [S′S.] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S]
Closure(I)
This closure tells us to add {B.γ,b} for each production Bγ and a
Terminal b in First(a)
Here Bγ is, SCC, first(a)={ℇ $}=$
S.CC is added.
I0=S′.S, $
S.CC, $ I0
C,cC, c / d
C.d, c / d
Goto((I2, c) = Cc.C, $
C.cC, $
C.d, $ } I6
Goto((I6, c) = Cc.C, $
C.cC, $
C.d, $ } I6
GOTO graph:
Accept:
S′S. is available in I1.
So I1,$= accept
Reduce:
Method:
Let C={I0,I,I2, …. In}
The state I of the parser is constructed from Ii.
The parsing actions of the state i are as follows
1. If [A.a, b] is in Ii and GOTO(Ii, a)=Ij , then set ACTION[I, a] = shift j
2. If [A., a] is in Ii, then set ACTION[I, a]= reduce A
3. If [S′S., $] is in Ii, then set ACTION[i,$]= accept
4. If GOTO(Ii, A)=Ij, then GOTO[i, A]=j
5. All entries not defined by the steps 1 through 4 are errors
6. The initial state of the parser is the set of items containing [S′.S, $ ]
6.5)Construction of LALR parsing table:
1) States 3 and 6 are same. They only differ in the 2nd item
Hence they can be combined
2) States 4 and 7 are same. They only differ in the 2nd item
Hence they can be combined
3) States 8 and 9 are same. They only differ in the 2nd item
Hence they can be combined
E′E
I = {E′E}
Closure(I)= E′.E
E.E+E
E.E*E I0
E. (E)
E.id
Goto(I0, E)= { E′E.
EE.+E
EE.*E } I1
Goto(I0, ( )={ E (. E)
E.E+E
E.E*E
E. (E)
E.id } I2
Goto(I2, ( )= { E (.E)
E.E+E
E.E*E
E. (E)
E.id } I2
Goto(I2, id )= {Eid . } I3
Goto(I3, null)
Goto(I4, ()= { E (. E)
E.E+E
E.E*E
E. (E)
E.id } I2
Goto(I4, id )= {Eid . } I3
Goto(I5,id )= {Eid . } I3
Goto(I6, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5
Goto(I7, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5
Goto(I8, *)= { E E* . E
E.E+E
E.E*E
E. (E)
E.id } I 5
Goto(i9,null)
Find follow of nonterminals
1. E′E
Follow(E)=follow(E′)
E′ is the start symbol. So follow(E′)=$
This follow(E)=$
2. EE+E
Follow(E)=+
3. EE*E
Follow(E)=*
4. E(E)
Follow(E)= )
N follow
T
E ) + * $
Accept:
If E′E. is in Ii, then accept
E′E. is in I1.
Hence (1,$)= accept
Reduce:
1. EE+E. in I7 r1
[ I7, follow symbols = r1]
Action( 7, ) ) = r1
Action( 7, + ) = r1
Action( 7, * ) = r1
Action( 7, $ ) = r1
2. EE*E. in I8 r2
[ I8, follow symbols = r2]
Action( 8, ) ) = r2
Action( 8, + ) = r2
Action( 8, * ) = r2
Action( 8, $ ) = r2
3. E(E). is in I9 r3
[ I9, follow symbols = r3]
Action( 9, ) ) = r3
Action( 9, + ) = r3
Action( 9, * ) = r3
Action( 9, $ ) = r3
4. Eid. Is in I3 r4
[ I3, follow symbols = r3]
Action( 3, ) ) = r4
Action( 3, + ) = r4
Action( 3, * ) = r4
Action( 3, $ ) = r4
Shift:
Refer the terminals
( I0, ( ) gives I2
(r,c) : 0,(s2
( I0, id ) gives I3
(r,c) : 0,id s3
( I1, + ) gives I4
(r,c) : 1,+s4
( I1, * ) gives I5
(r,c) : 1,* s5
( I2, ( ) gives I2
(r,c) : 2,(s2
( I2, id ) gives I3
(r,c) : 2,id s3
( I4, ( ) gives I2
(r,c) : 4,(s2
( I4, id ) gives I3
(r,c) : 4,id s3
( I5, ( ) gives I2
(r,c) : 5,(s2
( I5, id ) gives I3
(r,c) : 5,id s3
( I6, + ) gives I4
(r,c) : 6,+s4
( I6, * ) gives I5
(r,c) : 6,* s5
( I6, ) ) gives I4
(r,c) : 6,)s9
( I7, + ) gives I4
(r,c) : 7,+s4
( I7, * ) gives I5
(r,c) : 7,* s5
( I8, + ) gives I4
(r,c) : 8,+s4
( I8, * ) gives I5
(r,c) : 8,* s5
Stat Action Goto
e id + * ( ) $ E
0 s3 s 1
2
1 S4 S5 accept
2 s3 s
2
3 r4 r4 r4 r4
4 s3 s 8
2
5 s3 s
2
6 s4 s5 s9
7 r1/s4 s5/r1 r1 r1
8 r2/s4 r2/s5 r2 r2
9 r3 r3 r3 r3
Assuming + is left associative, the action of state 7 on input + should be to reduce EE+E
Assuming * takes precedence over +, the action of state 7 on input * should be to shift
Similarly, assuming that * is left associative and takes precedence over +, we can say that,
The action of state 8 on both the inputs + and * should be to reduce EE*E.
(in case of input+, the reason is * takes precedence over +)
(in case of input *, the reason is * is left associative)
S′S
I = { S′S}
Closure(I)= S′.S
S. iSeS
S. iS I0
S. a
Goto(I0, a)= = S a . I3
Goto(I1, Null)
Goto(I2, i)=
S i.SeS I2
S i.S
S. iSeS
S. iS
S. a
Goto(I2, a)= = S a . I3
Goto(I3, Null)
S. a
Goto(I5, a)= = S a . I3
1. S′S
Follow(S)=follow(S′)
S′ is the start symbol. So follow(S′)=$
This follow(S)=$
2. SiSeS
Follow(S)=e
Accept:
If S′S. is in Ii, then accept
S′S. is in I1.
Hence (1,$)= accept
Reduce:
1. SiSeS . is in I6 r1
[ I6, follow symbols = r1]
Action( 6, e ) = r1
Action( 6, $ ) = r1
2. S iS. Is in I4 r2
[ I4, follow symbols = r2]
Action( 4, e ) = r2
Action( 4, $ ) = r2
3. Sa is in I3 r3
[ I3, follow symbols = r3]
Action( 3, e ) = r3
Action( 3, $ ) = r3
Shift: