Top Down Parsing

Top-Down Parsing
Top-down parsing methods

Recursive descent Predictive parsing
Implementation of parsers Two approaches

Top-down easier to understand and program manually Bottom-up more powerful, used by most parser generators
Reading: Section 4.4
Intro to Top-Down Parsing

The parse tree is constructed
From the top From left to right t2 4 t5 t6 1 3 7 t8 t9
Terminals are seen in order of appearance in the token stream: t2 t5 t6 t 8 t9
Recursive Descent Parsing

Consider the grammar
ET+E|T T int | int * T | ( E )
Token stream is: int5 * int2 Start with top-level non-terminal E Try the rules for E in order
Recursive Descent Parsing Example

Try E0 T1 + E2 Then try a rule for T1 ( E3 )
But ( does not match input token int5
Try T1 int - Token matches.

But + after T1 does not match input token *
Try T1 int * T2
This will match but + after T1 will be unmatched
Has exhausted the choices for T1

Backtrack to choice for E0
Recursive Descent Parsing Example

Try E0 T1 Follow same steps as before for T1
And succeed with T1 int * T2 and T2 int With the following parse tree E0 T1 int5 * T2 int2
Recursive Descent Parser Preliminaries

Let TOKEN be the type of tokens
Special tokens INT, OPEN, CLOSE, PLUS, TIMES
Let the global next point to the next token
Recursive Descent Parser Implementing Productions

Define boolean functions that check the token string for a match of
A given token terminal bool term(TOKEN tok) { return *next++ == tok; } A given production of S (the nth) bool Sn() { } Any production of S: bool S() { }
These functions advance next

For production E T
bool E1() { return T(); }
For production E T + E
bool E2() { return T() && term(PLUS) && E(); }
For all productions of E (with backtracking)

bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

Functions for non-terminal T
bool T1() { return term(OPEN) && E() && term(CLOSE); } bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }
Recursive Descent Parsing Notes

To start the parser
Initialize next to point to first token Invoke E()
Notice how this simulates our previous example Easy to implement by hand But does not always work
When Recursive Descent Does Not Work

Consider a production S S a
bool S1() { return S() && term(a); } bool S() { return S1(); }
S() will get into an infinite loop left-recursive grammar has a non-terminal S
S + S for some
Recursive descent does not work in such cases
Elimination of Left Recursion

Consider the left-recursive grammar
SS|
S generates all strings starting with a and followed by a number of Can rewrite using right-recursion
S S S S |
More Elimination of LeftRecursion

In general
S S 1 | | S n | 1 | | m
All strings derived from S start with one of 1,,m and continue with several instances of 1,,n Rewrite as
S 1 S | | m S S 1 S | | n S |
General Left Recursion

The grammar
SA| AS is also left-recursive because
S + S This left-recursion can also be eliminated See book, Section 4.3 for general algorithm
Summary of Recursive Descent

Simple and general parsing strategy
Left-recursion must be eliminated first but that can be done automatically
Unpopular because of backtracking

Thought to be too inefficient
In practice, backtracking is eliminated by restricting the grammar
Predictive Parsers
Like recursive-descent but parser can predict which production to use
By looking at the next few tokens No backtracking
Predictive parsers accept LL(k) grammars

L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead
In practice, LL(1) is used
LL(1) Languages
In recursive-descent, for each non-terminal and input token, may be a choice of production LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables
One dimension for current non-terminal to expand One dimension for next token A table entry contains one production
Predictive Parsing and Left Factoring

Recall the grammar
ET+E|T T int | int * T | ( E )
Hard to predict because

For T two productions start with int For E it is not clear how to predict
A grammar must be left-factored before use for predictive parsing
Left-Factoring Example
Recall the grammar
ET+E|T T int | int * T | ( E ) Factor out common prefixes of productions
ETX X+E| T ( E ) | int Y Y*T|
LL(1) Parsing Table Example

Left-factored grammar
ETX T ( E ) | int Y X+E| Y*T|
+ +E int Y *T (E) ( TX ) $
LL(1) parsing table:

E X T Y int TX *
LL(1) Parsing Table Example

Consider the [E, int] entry
When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place
Consider the [Y,+] entry

When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation in which Y
LL(1) Parsing Tables - Errors

Blank entries indicate error situations
Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E
Using Parsing Tables

Method similar to recursive descent, except
For each non-terminal S We look at the next token a And chose the production shown at [S,a]
We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input
LL(1) Parsing Algorithm

initialize stack = <S $> and next repeat case stack of <X, rest> : if T[X,*next] = Y1Yn then stack <Y1 Yn rest>; else error (); <t, rest> : if t == *next ++ then stack <rest>; else error (); until stack == < >
LL(1) Parsing Example

Stack E $ T X $ int Y X $ Y X $ * T X $ T X $ int Y X $ Y X $ X $ $ Input int * int * int * * int * int int $ int $ $ $ $ int $ int $ int $ $ $ Action T X int Y terminal * T terminal int Y terminal ACCEPT
Constructing Parsing Tables

LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG
Constructing Parsing Tables

If A , where in the line of A we place ? In the column of t where t can start a string derived from
* t We say that t First()
In column of t if is and t can follow an A

S * A t We say t Follow(A)
Computing First Sets

Definition: First(X) = { t | X * t} { | X * } Algorithm sketch (see book for details): 1. for all terminals t do First(t) {t} 2. for each production X do First(X) {} 3. if X A1 An and First(Ai), 1 i n do
add First() to First(X)
4.
for each X A1 An s.t. First(Ai), 1 i n do

add to First(X)
5.
repeat steps 4 & 5 until no First set can be grown
First Sets - Example

Recall the grammar
ETX T ( E ) | int Y X+E| Y*T| First( T ) = {int, ( } First( E ) = {int, ( } First( X ) = {+, } First( Y ) = {*, }
First sets
First( First( First( First( First( ()={(} ))={)} int ) = { int } +)={+} *)={*}
Computing Follow Sets

Definition: Follow(X) = { t | S * X t } Intuition
If S is the start symbol then $ Follow(S) If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * then Follow(X) Follow(A)
Computing Follow Sets (Cont.)

Algorithm sketch:
1. 2.
Follow(S) {$} For each production A X
add First() - {} to Follow(X) add Follow(A) to Follow(X)
3.
For each A X where First()
repeat step(s) 2-3 until no Follow set grows
Follow Sets. Example

Recall the grammar
ETX T ( E ) | int Y X+E| Y*T| Follow( * ) = { int, ( } Follow( E ) = {), $} Follow( T ) = {+, ) , $} Follow( Y ) = {+, ) , $}
Follow sets
Follow( Follow( Follow( Follow( + ) = { int, ( } ( ) = { int, ( } X ) = {$, ) } ) ) = {+, ) , $}
Follow( int) = {*, +, ) , $}
Constructing LL(1) Parsing Tables

Construct a parsing table T for CFG G For each production A in G do:
For each terminal t First() do T[A, t] = If First(), for each t Follow(A) do T[A, t] = If First() and $ Follow(A) do T[A, $] =
Notes on LL(1) Parsing Tables

If any entry is multiply defined then G is not LL(1)
If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well
Most programming language grammars are not LL(1) There are tools that build LL(1) tables
Predictive Parsing Summary

First and Follow sets are used to construct predictive tables
For non-terminal A and input t, use a production A where t First() For non-terminal A and input t, if First(A) and t Follow(), then use a production A where First()
Well see First and Follow sets again . . .

Top Down Parsing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Top Down Parsing

Uploaded by

Copyright:

Available Formats

Top-Down Parsing

Top-down parsing methods

Implementation of parsers Two approaches

Reading: Section 4.4

Intro to Top-Down Parsing

Terminals are seen in order of appearance in the token stream: t2 t5 t6 t 8 t9

Recursive Descent Parsing

Recursive Descent Parsing Example

Try T1 int - Token matches.

Has exhausted the choices for T1

Recursive Descent Parsing Example

Recursive Descent Parser Preliminaries

Let the global next point to the next token

Recursive Descent Parser Implementing Productions

These functions advance next

Recursive Descent Parser Implementing Productions

For all productions of E (with backtracking)

Recursive Descent Parser Implementing Productions

Recursive Descent Parsing Notes

When Recursive Descent Does Not Work

Recursive descent does not work in such cases

Elimination of Left Recursion

More Elimination of LeftRecursion

General Left Recursion

Summary of Recursive Descent

Unpopular because of backtracking

In practice, backtracking is eliminated by restricting the grammar

Predictive parsers accept LL(k) grammars

In practice, LL(1) is used

Predictive Parsing and Left Factoring

Hard to predict because

A grammar must be left-factored before use for predictive parsing

LL(1) Parsing Table Example

LL(1) parsing table:

LL(1) Parsing Table Example

Consider the [Y,+] entry

LL(1) Parsing Tables - Errors

Using Parsing Tables

LL(1) Parsing Algorithm

LL(1) Parsing Example

Constructing Parsing Tables

Constructing Parsing Tables

In column of t if is and t can follow an A

Computing First Sets

add First() to First(X)

for each X A1 An s.t. First(Ai), 1 i n do

repeat steps 4 & 5 until no First set can be grown

First Sets - Example

Computing Follow Sets

Computing Follow Sets (Cont.)

Follow(S) {$} For each production A X

add First() - {} to Follow(X) add Follow(A) to Follow(X)

For each A X where First()

repeat step(s) 2-3 until no Follow set grows

Follow Sets. Example

Follow( int) = {*, +, ) , $}

Constructing LL(1) Parsing Tables

Notes on LL(1) Parsing Tables

Predictive Parsing Summary

Well see First and Follow sets again . . .

You might also like