Professional Documents
Culture Documents
Top Down Parsing
Top Down Parsing
Token stream is: int5 * int2 Start with top-level non-terminal E Try the rules for E in order
Try T1 int * T2
This will match but + after T1 will be unmatched
For production E T + E
bool E2() { return T() && term(PLUS) && E(); }
Notice how this simulates our previous example Easy to implement by hand But does not always work
S() will get into an infinite loop left-recursive grammar has a non-terminal S
S + S for some
S generates all strings starting with a and followed by a number of Can rewrite using right-recursion
S S S S |
All strings derived from S start with one of 1,,m and continue with several instances of 1,,n Rewrite as
S 1 S | | m S S 1 S | | n S |
S + S This left-recursion can also be eliminated See book, Section 4.3 for general algorithm
Predictive Parsers
Like recursive-descent but parser can predict which production to use
By looking at the next few tokens No backtracking
LL(1) Languages
In recursive-descent, for each non-terminal and input token, may be a choice of production LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables
One dimension for current non-terminal to expand One dimension for next token A table entry contains one production
Left-Factoring Example
Recall the grammar
ET+E|T T int | int * T | ( E ) Factor out common prefixes of productions
ETX X+E| T ( E ) | int Y Y*T|
We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input
4.
5.
First sets
First( First( First( First( First( ()={(} ))={)} int ) = { int } +)={+} *)={*}
3.
Follow sets
Follow( Follow( Follow( Follow( + ) = { int, ( } ( ) = { int, ( } X ) = {$, ) } ) ) = {+, ) , $}
Most programming language grammars are not LL(1) There are tools that build LL(1) tables