You are on page 1of 26

TOP-DOWN

PARSING
Lecture Six
Recursive Descent
1. Start with start symbol E (top down)
2. Try the rules in order (from left to right)
 Grammar
E  T |T + E E

T  int | int * T | ( E )
 Input (5) T

(5) int
Mismatch” (”
backtracing

2
Recursive Descent
1. Start with start symbol E (top down)
2. Try the rules in order (from left to right)
 Grammar
E  T |T + E E

T  int | int * T | ( E )
 Input (5) T

int * T
(5) Mismatch “(“
backtracing

3
Recursive Descent
1. Start with start symbol E (top down)
2. Try the rules in order (from left to right)
 Grammar
E  T |T + E E

T  int | int * T | ( E )
 Input (5) T

( E )
(5) Match “(“
next input

4
Recursive Descent
1. Start with start symbol E (top down)
2. Try the rules in order (from left to right)
 Grammar
E  T |T + E E

T  int | int * T | ( E )
 Input (5) T

( E )
(5)
T
Match next input
Accept input
int
5
Recursive Descent Algorithm
Define Boolean functions that check for a match of
token(terminal)
bool term(TOKEN tok) { return *next++ == tok; }
–The nth production of S:

bool Sn() { … }
–Try all productions of S:

bool S() { … }

Alex aiken stanford 6


Recursive Descent Algorithm
 Easy to implement by hand
E  T |T + E
T  int | int * T|(E)

Parser Function()
{
–Initialize next to point to first token
(call scanner to return one token at a time)

–Invoke StartSymbol() in this example E()


}
7
Recursive Descent Algorithm
E  T |T + E
1. For production E → T
bool E1() { return T(); }
2. For production E → T + E
bool E2() { return T() && term(PLUS) && E(); }
3. For production E  T |T + E
bool E() {
TOKEN *save = next;
return (next = save, E1())
|| (next = save, E2()); }
8
Recursive Descent Algorithm
T  int | int * T |(E)
For production T → int
bool T1() { return term(INT); }
2. For production T → int * T
bool T2() { return term (INT) && term(TIMES) && T();
}
3. For production T → (E)
bool T3() {
return term(LPAREN) && E()
&& term(RPAREN); 9
Recursive Descent Algorithm
T  int | int * T |(E)
For production T  int | int * T |(E )
bool T()
{
TOKEN *save = next;
return (next = save, T1())
|| (next = save, T2())
|| (next = save, T3());
}

10
Parse()
{
next=getToken(); E()}
bool term(TOKEN tok) { return *next++ == tok; }
bool E1() { return T(); }
bool E2() { return T() && term(PLUS) && E(); }
bool E() {TOKEN *save = next; return (next = save, E1())
|| (next = save, E2()); }
bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }
bool T() { TOKEN *save = next; return (next = save, T1())
|| (next = save, T2())
|| (next = save, T3());

} 11
PREDICTIVE
PARSING
Lecture Seven
Recursive Descent vs. Predictive
Parsers
 Recursive Descent
• At each step, many choices of production to use.
• Backtracking used to undo bad choices

 Predictive Parsers
• parser can “predict” which production to use By looking at the next few tokens
(look ahead).
• It uses a restricted form of grammar (LL(k) grammars)
• LL(k) stands for Left to right scan and Left most derivation for k look ahead
tokens.
• k usually =1. therefore, it usually called LL(1) Parser
• At each step, only one choice of production.
• No backtracking(grammar is deterministic).

13
Left Recursion
Left recursion is used to make operations left associative.
Simple immediate left recursion:
S S α | β
 To remove the left recursion, we rewrite this grammar rule into two
rules:
1. one that generates base case using right recursion
2. one that generates the repetitions using right recursion

 S β S’
 S’ α S’ | ε
14
Left Factoring
Left factoring is required when two or more grammar rule choices
share a common prefix string.
ET+E|T
T  int | int * T | ( E )
 This grammar is not acceptable for LL(1) Parsing because it is hard to predict
the right production
• For T : two productions start with int
• For E : two productions start with T
 We need to left-factor the grammar(Nondeterministic grammar to
deterministic)
ETX
X+E|
T  int Y | ( E )
Y*T|
15
LL(1) Parsing Table
E →T X X→+E|ε Input Tokens
T →( E ) | int Y Y→*T|ε (Terminals)

int * + ( ) $
E E →T X E →T X
X X→+E X→ε X→ε
T T →int Y T →( E )
Y Y→*T Y →ε Y →ε Y →ε

Left most non Terminal RHS of


Production
[E, int] entry : current non-terminal is E and next input is int use production E  T X
[Y,+] current non-terminal is Y and current token is +, get rid of Y Y epsilon
[E,*] entry –“There is no way to derive a string starting with * from non-terminal E error
16
LL(1) Parsing Table
 Use stack instead of recursive function in recursive
descent.
 $ marks end of input or bottom of the stack.
 Push start symbol.
 While (!emptyStack){
case stack of
<nonTerminal, rest> : if exist in Parsing table
replace with RHS of
production
else error
<Terminal, rest> : if t == *input++ Match terminal
else error }
Reject on reaching error state
Accept if input string and stack t become empty
17
LL(1) Parsing Table
Stack Input Action Input  int *int $
$E int*int$ Replace E →T X E →T X X→+E|ε
T →( E ) | int Y Y→*T|ε
$XT int*int$ Replace T→int Y
$ X Y int int*int$ Match(Pop int) int * + ( ) $
$ XY *int$ Replace Y → * T E
E→ E→
TX TX
$XT * *int$ Match
X→ X→ X→
X
$XT int$ Replace T→int Y +E ε ε

$ X Y int int$ Match T→ T→


T intY (E)
$ XY $ Replace Y→ε
Y→ Y Y
Y Y →ε
$X $ Replace X→ε *T →ε →ε

$ $ Accept

18
LL(1) Parsing Table
 LL(l) parsing table : a two-dimensional array M[N, T].
 N:nonterminal , Terminal
 We add production choices to this table according to the
following rules for production rule A → α :
1. There is a derivation α=>* a X, add A → α to the table
entry M[A, a].
2. There are derivations α→ε and S=>* AaY.
S: start Symbol add A → α to the table entry M[A, a].
These rules are difficult to implement so the First and
Follow sets are used.

19
First Sets
 If X is a terminal or , then First(X) = {X}.
 If X is a nonterminal,
X  X1X2...Xn
• {First(X1){}}⊂ First(X)
• While ( ϵ First(Xi))
{
{First(Xi+1){}}⊂ First(X)
}
• If  ϵ {First(X1), .., First(Xn)} then  ϵ First(X).

20
First Sets
E →T X
X→+E|ε
T →( E ) | int Y
Y→*T|ε

First of Terminal First of nonterminal


First(+)={+} First(X)={+, ε}
First(*)={*} First(Y )={*, ε}
First( ( )={(} First(T)={(,int}
First( ) )={)} First(E)= First (T)={(,int}
First( int )={int}

21
Follow Sets
Nonterminal A, the set Follow(A):
1. If A is the start symbol, then $ is in
Follow(A).
2. If there is a production B   A , then
First(){} is in Follow(A).
3. If there is a production B   A  such
that  is in First(), then Follow(A) contains
Follow(B).
22
CS510 Spring 2015 Shahira Azazy
Follow Sets
E →T X
X→+E|ε
T →( E ) | int Y
Y→*T|ε
Follow of Terminal Follow of nonterminal
Follow(+)={(,int} Follow(E)= {$,)}
Follow(*)={(,int} Follow(X)= {$,)}
Follow(()={(,int} Follow(T)= {+,),$}
Follow())={+,),$)} Follow(Y )={+,),$}
Follow( int )={*, +,),$}
23
LL(1) Parsing Table
for each production choice A :
• For every token a in First(),
add A   to the entry M[A,a].
• If  is in First(), then for every element x in
Follow(A),
add A   to the entry M[A,x].

24
LL(1) Parsing Table
E →T X X→+E|ε
T →( E ) | int Y Y→*T|ε
int * + ( ) $
First of nonterminal
E E →T X E →T X
First(X)={+, ε}
X X→+E X→ε X → ε First(Y )={*, ε}
First(T)={(,int}
T T →int Y T →( E )
First(E)= First
(T)={(,int}
Y Y→*T Y →ε Y →ε Y →ε
25
LL(1)Grammar
 Grammar is not LL(1) when:
 Left factored grammar
 Left recursive grammar
 Ambiguous grammar
 Multiple entry in LL(1) Parsing Table.

26

You might also like