Professional Documents
Culture Documents
Predictive descent
OR LL(1) parser
Classification of Parser
Types of Parser:
The parser is mainly classified into two categories, i.e. Top-down Parser, and Bottom-
up Parser. These are explained below:
a. Top-Down Parser:
The top-down parser is the parser that generates parse for the given input string with the
help of grammar productions by expanding the non-terminals i.e. it starts from the
start symbol and ends on the terminals. It uses left most derivation.
1. Backtracking
Now the first input symbol ‘a’ matches the first leaf node of the tree. So the parser will
move ahead and find a match for the second input symbol ‘b‘.
Possibility1:
Abc
REJECTED
Possibility2:
Ab
ACCEPTED
Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third
input symbol ‘d‘ matches the last leaf node ‘d‘ of the tree. Thereby successfully
completing the top-down parsing
We get an error in 1st method and go back to A to see whether there is another
production for A or not. So, corresponding parse tree is represented as in 2nd method
and we halt and announce successful completion of parsing.
2. Left Recursion
A Grammar G (V, T, P, S) is left recursive if it has a production in the form.
AAα|β
If left recursion is present in the grammar then top-down parser can enter into infinite
loop.
Example:
i) E → E+T|T
ii) T → T*F|F
iii) F → (E)|id
The left and right variables are the same in the production rules above, that is, E and T.
So to eliminate the left recursion, we have to change the production rules to a different
form.
E → TE′
E′→ +TE′|ϵ
ii) T → T*F|F α=*F and β=F
T → FT′
T′→ *FT′|ϵ
After eliminating the left recursion, the final production rules are as follows:
E → TE′
E′→ +TE′|ϵ
T → FT′
T′→ *FT′|ϵ
F → (E)|id
3. Left factoring
Left factoring is used in case of backtracking when it is not clear that which of the
production is used to expand the non-terminal, if there is no left recursion.
Rule-01:
For a production rule X → ∈, FIRST (X) = { ∈ }
For any terminal symbol ‘a’, FIRST (a) = { a }
Rule-02:
For a production rule X aY then add a to FIRST (X), First(X) = { a }
Rule-03:
For a production rule X → Y1Y2Y3,
Calculating First(X)
If ∈ ∉ FIRST(Y1), then FIRST(X) = FIRST(Y1)
If ∈ ∈ FIRST(Y1), then FIRST(X) = { FIRST(Y1) – ∈ } ∪ FIRST(Y2Y3)
( Y1∈ ) { ∈ – ∈}
Calculating First(Y2Y3)
If ∈ ∉ FIRST(Y2), then FIRST(Y2Y3) = FIRST(Y2)
If ∈ ∈ FIRST(Y2), then FIRST(Y2Y3) = { FIRST(Y2) – ∈ } ∪ FIRST(Y3)
Rule-01:
For the start symbol S, place $ in FOLLOW(S). FOLLOW(S) = { $ }
Rule-02:
For any production rule A → αB, FOLLOW(B) = FOLLOW(A)
Rule-03:
For any production rule A → αBβ,
If ∈ ∉ First(β), then FOLLOW(B) = First(β)
If ∈ ∈ First(β), then FOLLOW(B) = { First(β) – ∈ } ∪ FOLLOW(A)
( β ∈ )
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }
Parsing
Shift: Parser shifts zero or more input symbols onto the stack until the handle is
on top of the stack.
Reduce: Parser reduce or replace the handle on top of the stack to the left side of
production, i.e., R.H.S. of production is popped, and L.H.S is pushed.
Accept: Step 3 and Step 4 will be repeated until it has identified an error or until
the stack includes start symbol (S) and input Buffer is empty, i.e., it contains $.
Error: Signal discovery of a syntax error that has appeared and calls an error
recovery routine.
$E + E * id3$ Shift
$E + E * id3$ Shift
$E + E * id3 $ Reduce by E → id
$E + E * E $ Reduce by E → E * E
$E + E $ Reduce by E → E + E
$E $ Accept
Augmented Grammar:
If G is a grammar with starting symbol S, then G’ (augmented grammar for G) is a
grammar with a new starting symbol S ‘ and productions S’ •S
The purpose of this new starting production is to indicate to the parser when it should
stop parsing.
Example
Given grammar
S → AA
A → aA | b
The Augment grammar G` is represented by
S`→ S
S → AA
A → aA | b
Closure:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed
from I by the two rules:
1. Initially every item in I is added to closure(I).
2. If A α•Bβ is in closure(I) and B γ is a production then add the item B -> •γ to I,
If it is not already there. We apply this rule until no more items can be added to
closure(I).
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = S` → •S
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S
S → •AA
Add all productions starting with "A" in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b
I1= Goto(I0, S) = S` → S•
Here, the Production is reduced so close the State.
I1= S` → S•
Drawing DFA:
The DFA contains the 7 states I0 to I6.
Explanation:
o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
o I4, I5 and I6 all states contains the final item because they contain • in the right
most end. So rate the production as production number.
Productions are numbered as follows:
S → AA ... (1)
A → aA ... (2)
A→b ... (3)
I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
I4 contains the final item which drives A → b• and that production corresponds
to the production number 3 so write it as r3 in the entire row.
I5 contains the final item which drives S → AA• and that production corresponds
to the production number 1 so write it as r1 in the entire row.
I6 contains the final item which drives A → aA• and that production corresponds
to the production number 2 so write it as r2 in the entire row.
The steps which use to construct SLR (1) Table is given below:
If a state (Ii) is going to some other state (Ij) on a terminal then it corresponds to a shift
move in the action part.
If a state (Ii) is going to some other state (Ij) on a variable then it correspond to go to
move in the Go to part.
Example
S -> •Aa
A->αβ•
Follow(S) = {$}
Follow (A) = {a}
Explanation:
I1 contains the final item which drives S’ → S• and follow (S’) = {$}, so action {I1, $} =
Accept
0 gives A in I2, so 2 is added to the (0 rows and A column).
I0 gives S in I1,so 1 is added to the (1 row and S column).
similarly 5 is written in (2 row and A column), 6 is written in (3 row and A column).
I0 gives a in I3 .so S3(shift 3) is added to (0 row and a column).
I0 gives b in I4 .so S4(shift 4) is added to the (0 row and b column).
Similarly, S3(shift 3) is added on (2,3 rows and a column), S4(shift 4) is added on
(2,3 rows and b column).
Productions are numbered as follows:
S → AA ... (1)
A → aA ... (2)
A→b ... (3)
I4 is reduced state as ‘•‘ is at the end. I4 is the 3rd production of grammar (A–>•b).
LHS of this production is A. FOLLOW(A)={a,b,$}. Write r3(reduced 3) in the (4th row
and columns of a,b,$)
I5 is reduced state as ‘•‘ is at the end. I5 is the 1st production of grammar (S->•AA).
LHS of this production is S.
FOLLOW(S)={$}. Write r1(reduced 1) in the (5th row and column of $)
I6 is a reduced state as ‘•‘ is at the end. I6 is the 2nd production of grammar
(A–>•aA). The LHS of this production is A.
FOLLOW(A)={a,b,$}. Write r2(reduced 2) in the (6th row and columns of a,b,$)
Goto (I7, *)
T → T * •F
F → •id
(same as I6)
Drawing DFA:
o I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $} =
Accept
o I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2, +} =
R2, action {I2, $} = R2
o I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action {I3, +} =
R4, action {I3, *} = R4, action {I3, $} = R4
o I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action {I4, +} =
R5, action {I4, *} = R5, action {I4, $} = R5
o I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action {I7, +}
= R1, action {I7, $} = R1
o I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action {I8,
+} = R3, action {I8, *} = R3, action {I8, $} = R3.
Example:
Construct canonical collection and SLR table:
S iSeS
S iS
S a (Do Practice)
In the CLR (1), we place the reduce node only in the lookahead symbols.
LR (1) item
The look ahead is used to determine that where we place the final item.
The look ahead always add $ symbol for the argument production.
Step 1:
For the grammar G initially Add S’ •S, $ in the set of production.
Step 2:
CLOSURE FUNCTION
For each production A α • X β , a then Add X• γ, b where b= FIRST(β a)
Step 3:
GOTO FUNCTION
For each production A α • X β , a is in the set
then GOTO(A , X) = A α X• β , a
and if β γ then Add β •γ, a ’a’ will be same as above
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the lookahead.
S` → •S, $
S → •AA, $ b= FIRST(βa)= FIRST( $)= { $ }
A → •aA, a/b
A → •b, a/b b= FIRST(βa)= FIRST( A$)= FIRST{ A }= (a,b}
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = S` → •S, $
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
Drawing DFA:
NOTE: The placement of shift node in CLR (1) parsing table is same as the SLR (1)
parsing table. Only difference in the placement of reduce node.
I4 contains the final item which drives ( A → b•, a/b), so action {I4, a} = R3, action {I4, b}
= R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} = R2, action {I8,
b} = R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.
In the LALR (1) parsing, the LR (1) items which have same productions but different look
ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
Question: Construct LALR( 1 ) parsing table for the Grammar.
1. S → AA
2. A → aA
3. A → b
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the lookahead.
S` → •S, $
S → •AA, $ b= FIRST(βa)= FIRST( $)= { $ }
A → •aA, a/b
A → •b, a/b b= FIRST(βa)= FIRST( A$)= FIRST{ A }= (a,b}
Same as CLR(1)
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = S` → •S, $
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
If we analyze then LR (0) productions of I3 and I6 are same but they differ
only in their lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b }
I6= { A → a•A, $
A → •aA, $
A → •b, $ }
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so
we can combine them and called as I36.
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$ }
Drawing DFA:
States a b $ S A
I0 S36 S47 1 2
I1 Accept
I2 S36 S47 5
I36 S36 S47 89
I47 r3 r3 r3
I5 r1
I89 r2 r2 r2
Examples –
This is an example of operator grammar:
E->E+E/E*E/id
Operator precedence can only established between the terminals of the grammar. It
ignores the non-terminal.
However, the grammar given below is not an operator grammar because two non-
terminals are adjacent to each other:
S->SAS/a
A->bSb/b
We can convert it into an operator grammar, though:
S->SbSbS/SbS/a
A->bSb/b
EXAMPLE:
Grammar:
1. E → E+T/T
2. T → T*F/F
3. F → id Given string: w = id + id * id
Precedence table:
Parsing Action
o Both end of the given input string, add the $ symbol.
o Now scan the input string from left right until the ⋗ is encountered.
o Scan backwards left over all the equal precedence until the first left most ⋖ is
encountered.
o Everything between left most ⋖ and right most ⋗ is a handle.
o $ START SYMBOL $ means parsing is successful.
Advantages –
1. It can easily be constructed by hand.
2. It is simple to implement this type of parsing.
Disadvantages –
1. It is hard to handle tokens like the minus sign (-), which has two different
precedence (depending on whether it is unary or binary).
2. It is applicable only to a small class of grammars.
As we have discussed YACC in the first unit so you can go through the concepts again to
make things more clear.