You are on page 1of 29

Top-Down Parsing

v A parser is top-down if it discovers a parse tree top to bottom


- A top-down parse corresponds to a preorder traversal of the parse tree
- A leftmost derivation is applied at each derivation step
v Top-down parsers come in two forms
- Predictive Parsers
² Predict the production rule to be applied using lookahead tokens
- Backtracking Parsers
² Will try different productions, backing up when a parse fails

v Predictive parsers are much faster than backtracking ones


- Predictive parsers operate in linear time – will be our focus
- Backtracking parsers operate in exponential time – will not be considered
v Two kinds of top-down parsing techniques will be studied
- Recursive-descent parsing
- LL parsing
LL Parsing
Uses an explicit stack rather than recursive calls to perform a
parseLL(k) parsing means thatktokens of lookahead are
used
- The first L means that token sequence is read from left to right
- The second L means a leftmost derivation is applied at each step
v An LL parser consists of
- Parser stack that holds grammar symbols: non-terminals and tokens
- Parsing table that specifies the parser action
- Driver function that interacts with parser stack, parsing table and scanner
next token Parser Output
Scanner
Driver

Parsing Stack
Parsing
Table
LL Parsing Actions
The LL parsing actions are:
- Match: to match top of parser stack with next input token
- Predict: to predict a production and apply it in a derivation step
- Accept: to accept and terminate the parsing of a sequence of tokens
- Error: to report an error message when matching or prediction fails
Consider the following grammar:S(S)S|
Parser Stack Input Parser Action
S (())$ Predict S  ( S ) S
(S)S (())$ Match (
Parsing of ( ( ) ) S)S ())$ Predict S  ( S ) S
(S)S)S ())$ Match (
Stack grows backward S)S)S ))$ Predict S 
from right to left )S)S ))$ Match )
S)S )$ Predict S 
)S )$ Match )
S $ Predict S 
Empty $ Accept
Grammar Analysis: Nonterminals that Derive 
v Grammar analysis is necessary to
- Determine whether a grammar can be used in LL parsing
- Construct the LL parsing table that defines the actions of an LL parser
v A common analysis is to determine which nonterminals derive 
- Nonterminals that derive  are called nullable
v To determine which nonterminals derive  …
- We use an iterative marking algorithm
- First, nonterminals that derive  directly in one step are marked
- Nonterminals that derive  in two, three, … steps are found and marked
- Continue until no more nonterminals can be marked as deriving 
v Consider the following grammar
ABCD
A, B, C, and D are all nullable
BbC|
B, C, and D derive  directly
CcD|
Dd| A derives  indirectly: A  B C D  C D  D 
Grammar Analysis: The Follow Set
v Suppose we have the following grammar
- We follow derivations of S as shown below …

S  AcB aAcB…
A  aA
A  SAcB cbBaAcB…
B  bBS cbBScbBAcB
B   cB cbBcB…
c
We predictAaAwhen
- Next token is a because First(aA) = {a}
We predictAwhen
- Next token is c because Follow(A) = {c}
Similarly, we predictBbB Swhen
- Next token is b because First(b B S) = {b}
We predictBwhen
- Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
Top-Down Parsing – 15 Compiler Design – © Muhammed Mudawwar
Grammar Analysis: The First Set
v Suppose we have the following grammar:
- The RHS of the productions of S do not begin with terminals
- Parser has no immediate guidance which production to apply to expand S
- We may follow all possible derivations of S as shown below
h c a
SAa| Bb Dca
ADc| CA Aa
i c a

BdA| e CAa fC Aa…


bAa…
CfC | b S
dAb…
Dh|i Bb
v We predict S  A a when e b

- First token is h, i, f, or b. First(Aa) = {h, i, f, b}


v We predict S  B b when
- First token is d or e. First(Bb) = {d, e}
v Otherwise, we have an error
Grammar Analysis: Determining the First Set
v Formally, First() = {a  T | * a}
- First() is the set of all terminals that can begin a sentential form of 
v If *  then  First()
v To calculate First() we apply the following rules
First() = {} =
First(a) = {a}  = a
First(A) = First(1)  First(2)  …  = A and A 1 | 2 | …
First(A) = First(A)  = A and A is NOT nullable
First(A) = (First(A) – {})  First()  = A and A + 
Consider the following grammar:
S ABCd First(A) = {e, f, }
A e|f| First(B) = {g, h, }
B g|h| First(C) = {p, q}
C p|q First(S) = First(ABCd) = (First(A)–{})  (First(B)–{})  First(Cd)
= {e, f}  {g, h}  {p, q} = {e, f, g, h, p, q}
Grammar Analysis: Determining the Follow Set
+
v Formally, Follow(A) = {a  T | S   A a }
- Follow(A) is the set of terminals that may follow A in any sentential form
+
v If S   A then $  Follow(A)
- If A is not followed by any terminal then it is followed by the end of file
- The $ represents the end-of-file token
v We compute Follow(A) using the following rules:
- If A is the start symbol then $  Follow(A)
- Inspect RHS of productions for all occurrences of A Let a typical
production be B   A 
Ifis NOT nullable then add First() to Follow(A)
l Any token that can begin a sentential form of  can follow A
² If  is  or derives  then add (First() – {})  Follow(B) to Follow(A)
l If  vanishes in a given derivation then what follows A is what follows B
l …  B  A  A  … what follows A is what follows B is First()
Top-Down Parsing – 16 Compiler Design – © Muhammed Mudawwar
Examples on the First and Follow Sets
Example 1: Nonterminals that derive  are A and B
First(S) = First(AcB) = (First(A)–{})  First(cB) = {a, c}
S AcB First(A) = First(aA)  First() = {a, }
First(B) = First(bBS)  First() = {b, }
A aA
Follow(S) = {$}  Follow(B)
A  Follow(A) = {c}
B bBS Follow(B) = Follow(S)  First(S) = {$, a, c}
B  Follow(S) = {$, a, c}

Example 2: Nonterminals that derive  are Q and R


First(E) = First(TQ) = First(T) = First(FR) = First(F) = {( , id} First(Q) = {+ , – ,
E TQ }
First(R) = {* , / , }
Q+TQ|–TQ|TF
Follow(E) = {$ , )}
R Follow(Q) = Follow(E) = {$ , )}
R *FR|/ FR| Follow(T) = (First(Q)–{})  Follow(E)  Follow(Q) = {+, –, $, )}
F  ( E ) | id Follow(R) = Follow(T) = {+, –, $, )}
Follow(F) = (First(R)–{})  Follow(T)  Follow(R) = {*, /, +, –, $, )}
Top-Down Parsing – 17 Compiler Design – © Muhammed Mudawwar
Grammar Analysis: Determining the Predict Set
The predict set of a productionAis defined as follows:
IfisNOT nullablethen Predict(A) = First()
IfisNullablethen Predict(A ) = (First() – {})  Follow(A)
- This is the set of lookahead tokens that will cause the selection of A 
Example on determining the predict set:
ETQ Predict E  T Q = First(TQ) = First(T) = {( , id}
Q+TQ Predict Q  + T Q = First(+TQ) = { + }
Q–TQ Predict Q  – T Q = First(–TQ) = { – }
Q  Predict Q  = Follow(Q) = {$ , )}
T FR Predict T  F R = First(FR) = First(F) = {( , id}
R*FR Predict R  * F R = First(*FR) = { * }
R/ FR Predict R  / F R = First(/FR) = { / }
R  Predict R  = Follow(R) = {+ , – , $ , )}
F(E ) Predict F  ( E ) = { ( }
F  id Predict F  id = { id }
Top-Down Parsing – 18 Compiler Design – © Muhammed Mudawwar
LL(1) Grammars
v Not all context-free grammars are suitable for LL parsing
v CFGs suitable for LL(1) parsing are called LL(1) Grammars
v A grammar is LL(1) if for productions with the same LHS A
A 1 | 2 | … | n
Predict(A i)  Predict(A j) =  for all i  j
The predict sets of productions with same LHS are pairwise disjoint
The following grammar is LL(1)
SAcB
AaA Predict(A  a A) = {a}
Disjoint
A  Predict(A ) = Follow(A) = {c}
BbBS Predict(B  b B S) = {b}
B  Predict(B ) = Follow(B) = Follow(S)  First(S) = Disjoint
= {$}  Follow(B)  {a, c} = {$, a, c}
Top-Down Parsing – 19 Compiler Design – © Muhammed Mudawwar
Constructing the LL(1) Parsing Table
v The predict sets can be represented in an LL(1) parse table
- The rows are indexed by the nonterminals
- The columns are indexed by the tokens
v If A is a nonterminal and tok is the lookahead token then
- Table[A][tok] indicates which production to predict
- If no production can be used Table[A][tok] gives an error value

v Table[A][tok] = A  iff tok  predict(A )


v Example on constructing the LL(1) parsing table:
1: S  A c B Predict(1) = {a, c}
2: A  a A Predict(2) = {a} a b c $
3: A  Predict(3) = {c} S 1 1 Empty slots
4: B  b B S Predict(4) = {b} indicate error
5: B  Predict(5) = {$, a, c} A 2 3 conditions
B 5 4 5 5
Constructing the LL(1) Parsing Table – cont'd
Here is a second example on constructing the parsing table
1: E  T Q Predict(1) = { ( , id }
2: Q  + T Q Predict(2) ={+}
3: Q  – T Q Predict(3) ={–} + – * / ( ) id $

4: Q  Predict(4) ={$,)} E 1 1


5: T  F R Predict(5) = { ( , id } Q 2 3 4 4
6: R  * F R Predict(6) ={*}
T 5 5
7: R  / F R Predict(7) ={/}
8: R  Predict(8) = {+, –, $, )} R 8 8 6 7 8 8
9: F  ( E ) Predict(9) ={(} F 9 10
10: F  id Predict(10) = { id }
v Because the above grammar is LL(1)
- A unique production number is stored in a table entry
v Blank entries correspond to error conditions
- In practice, special error numbers are used to indicate error situations
LL(1) Parser Driver Algorithm
The LL(1) parser driver algorithm can be described as follows:
Token := scan( )
Stack.push(StartSymbol) while not Parser
Stack.empty( ) do Driver
X := Stack.pop( ) if
terminal(X)
if X = Token then Token := scan( )
Parsing
else process a syntax error at Token end if else (* X is Output
Table
a nonterminal *)
Rule := Table[X][Token]
if Rule = X  Y1 Y2 … Yn then

Parsing Stack
for i from n downto 1 do Stack.push(Yi) end for else process
a syntax error at Token end if
end if
end while
if Token = $ then accept parsing
else report a syntax error at Token end if

Scanner

next token
Tracing an LL(1) Parser Parser Stack Input Parser Action
E id*(id+id)$ Predict E  T Q
Consider the parsing of id * (id + id) $ TQ id*(id+id)$ Predict T  F R
FRQ id*(id+id)$ Predict F  id
id R Q id*(id+id)$ Match id

to left
RQ *(id+id)$ Predict R  * F R
1: E  T Q 6: R  * F R
*FRQ *(id+id)$ Match *
2: Q  + T Q 7: R  / F R
FRQ (id+id)$ Predict F  ( E )
3: Q  – T Q 8: R 

from right
(E)RQ (id+id)$ Match (
4: Q  9: F  ( E ) E)RQ id+id)$ Predict E  T Q
TQ)RQ id+id)$ Predict T  F R
5: T  FR 10: F  id
FRQ)RQ id+id)$ Predict F  id
id R Q ) R Q id+id)$ Match id
RQ)RQ +id)$ Predict R 
+ – * / ( ) id $ Q)RQ +id)$ Predict Q  + T Q

Stack grows backwards


+TQ)RQ +id)$ Match +
E 1 1
TQ)RQ id)$ Predict T  F R
Q 2 3 4 4 FRQ)RQ id)$ Predict F  id
id R Q ) R Q id)$ Match id
T 5 5 RQ)RQ )$ Predict R 
Q)RQ )$ Predict Q 
R 8 8 6 7 8 8
)RQ )$ Match )
F 9 10 RQ $ Predict R 
Q $ Predict Q 
Empty $ Accept

You might also like