You are on page 1of 16

Non-recursive predictive parsing

Observation:
Our recursive descent parser encodes state
information in its run-time stack, or call
stack.
Using recursive procedure calls to implement a
stack abstraction may not be particularly ecient.

This suggests other implementation methods.


 explicit stack, hand-coded parser
 stack-based, table-driven parser

CPSC 434 Lecture 7, Page 1


Non-recursive predictive parsing
Now, a predictive parser looks like:

stack
6
?
source - scanner - parser - il
code 6
?
parsing
tables
optional

Rather than writing code, we build tables.


Building tables can be automated!

CPSC 434 Lecture 7, Page 2


Table-driven parsers
A parser generator system often looks like:

stack
6
?
table-
source - scanner - driven - il
code parser
6

?
parser parsing
grammar - -
generator tables

This is true for both top down and bottom up


parsers
LL(1): left to right, leftmost derivation,
lookahead(1)
LR(1): left to right, reverse rightmost derivation,
lookahead(1)
CPSC 434 Lecture 7, Page 3
Table-driven parsing algorithm
Input: a string w and a parsing table M for G
tos 0
Stack[tos++] eof
Stack[tos++] Start Symbol
token next token()

X Stack[tos]
repeat
if X is a terminal or eof then
if X = token then
pop X
token next token()
else error()

else /* X is a non-terminal */
if M [X,token] = X ! Y1Y2    Y k then
pop X
push Y ; Y ?1;    ; Y1
k k

else error()

X Stack[tos]
until X = eof

Aho, Sethi, and Ullman, Algorithm 4.3

CPSC 434 Lecture 7, Page 4


The grammar and its table
Our long-su ering expression grammar
hgoali ::= hexpri
hexpri ::= htermi hexpr0i
hexpr0i ::= + hexpri
j - hexpri
j 
htermi ::= hfactori hterm0i
hterm0i ::= * htermi
j / htermi
j 
hfactori ::= num
j id
LL(1) parse table
id num + - * / eof
hgoali g !e g!e { { { { {
hexpri e ! te e ! te
0
{0
{ { { {
hexpr i
0
{ { e ! +e e ! -e
0
{ 0
{ e ! 0

htermi t ! ft t ! ft
0
{0
{ { { {
hterm i
0
{ { t ! t !  t ! *t t ! /t t ! 
0 0 0 0 0

hfactori f ! id f ! num { { { { {

CPSC 434 Lecture 7, Page 5


The FIRST set
For a string of grammar symbols , de ne FIRST( )
as
 the set of terminal symbols that begin strings
derived from
 if ) , then  2 FIRST( )
FIRST( ) contains the set of tokens valid in the rst
position of
To build FIRST(X ):
1. if X is a terminal, FIRST(X ) is fX g
2. if X ::= , then  2 FIRST(X )
3. if X ::= Y1Y2    Y then put FIRST(Y1) in
k

FIRST(X )
4. if X is a non-terminal and X ::= Y1Y2    Y , k

then a 2 FIRST(X ) if a 2 FIRST(Y ) and


i

 2 FIRST(Y ) for all 1  j < i


j

(If  6 2 FIRST(Y1), then FIRST(Y ) is irrelevant, for


i

1 < i)

CPSC 434 Lecture 7, Page 6


Our example grammar

1 hgoali ::= hexpri


2 hexpri ::= htermi hexpr0 i
3 hexpr0i ::= + hexpri
4 j - hexpri
5 j 
6 htermi ::= hfactori hterm0i
7 hterm0i ::= * htermi
8 j / htermi
9 j 
10 hfactori ::= num
11 j id

CPSC 434 Lecture 7, Page 7


The FIRST construction

rule 1 2 3 4 FIRST

goal { { num,id { fnum,idg


expr { { num,id { fnum,idg
expr0 {  +,- { f,+,-g
term { { num,id { fnum,idg
term0 {  *,/ { f,*,/g
factor { { num,id { fnum,idg
num num { { { fnumg
id id { { { fidg
+ + { { { f+g
- - { { { f-g
* * { { { f*g
/ / { { { f/g

CPSC 434 Lecture 7, Page 8


The FOLLOW set
For a non-terminal A, de ne FOLLOW(A) as
the set of terminals that can appear immediately
to the right of A in some sentential form
Thus, a non-terminal's FOLLOW set speci es the
tokens that can legally appear after it
A terminal symbol has no FOLLOW set

To build FOLLOW(X ):
1. place eof in FOLLOW(hgoali)
2. if A ::= B , then put fFIRST( ) ? g in
FOLLOW(B )
3. if A ::= B then put FOLLOW(A) in FOLLOW(B )
4. if A ::= B and  2 FIRST( ), then put
FOLLOW(A) in FOLLOW(B )

CPSC 434 Lecture 7, Page 9


The FOLLOW construction

rule 1 2 3 4 FOLLOW

goal eof { { { feofg


expr { { eof { feofg
expr0 { { eof { feofg
term { +,- { eof feof,+,-g
term0 { { eof,+,- { feof,+,-g
factor { *,/ { eof,+,- feof,+,-,*,/g

CPSC 434 Lecture 7, Page 10


LL(1) parse table construction
Input: a grammar G

Method
1. 8 production A ::= , perform steps 2{4
2. 8 terminal a in FIRST( ), add A ::= to M [A; a]
3. if  2 FIRST( ), add A ::= to M [A; b]
8 terminal b 2 FOLLOW(A)
4. if  2 FIRST( ) and eof 2 FOLLOW(A), add
A ::= to M [A; eof]
5. set each unde ned entry of M to error
If this fails, the grammar is not LL(1)

Aho, Sethi, and Ullman, Algorithm 4.4

CPSC 434 Lecture 7, Page 11


LL(1) parse table for example
id num + - * / eof
hgoali g!e g!e { { { { {
hexpri e ! te e ! te
0
{ 0
{ { { {
hexpr i
0
{ { e ! +e e ! -e
0
{ 0
{ e ! 0

htermi t ! ft t ! ft
0
{ 0
{ { { {
hterm i
0
{ { t ! t !  t ! *t t ! /t t ! 
0 0 0 0 0

hfactori f ! id f ! num { { { { {

Symbol FIRST FOLLOW


<goal> f id,number g f eof g
<expr> f id,number g f eof g
<expr0 > f , +, - g f eof g
<term> f id,number g f eof,+,- g
<term0 > f , *, / g f eof,+,- g
<factor> f id,number g f eof,+,-,*,/ g
+ f+g |
- f-g |
* f*g |
/ f/g |
id f id g |
number f number g |

CPSC 434 Lecture 7, Page 12


Building the tree
Insert some code at the appropriate points
tos 0
Stack[tos++] eof
Stack[tos++] root node
Stack[tos++] Start Symbol
token next token()
X Stack[tos]
repeat
if X is a terminal or eof then
if X = token then
pop X
token next token()
pop and ll in node
else error()
else /* X is a non-terminal */
if M[
X,token ] = X ! Y Y Y
1 2 k then
pop X
pop node for X
build node for each child and
make it a child of node for X
push k k n ; Y ; n ?1; Y ?1;    ; n1; Y1
k k

else error()
until X = eof

CPSC 434 Lecture 7, Page 13


LL(1) grammars
Features
 input parsed from left to right
 leftmost derivation
 one token lookahead
De nition
A grammar G is LL(1) if and only if, for all
non-terminals A, each distinct pair of
productions A ::= and A ::= satisfy the
condition FIRST( ) T FIRST( ) = ;
A grammar G is LL(1) if and only if for each set of
productions A ::= 1 j 2 j    j
n

1. FIRST( 1); FIRST( 2);    ; FIRST( ) are all


n

pairwise disjoint
2. if ) , then FIRST( ) T FOLLOW(A) = ;, for
i j

all 1  j  n;i 6 =j .

If G is {free, condition 1 is sucient.

CPSC 434 Lecture 7, Page 14


LL(1) grammars
Provable facts about LL(1) grammars:
 no left recursive grammar is LL(1)
 no ambiguous grammar is LL(1)
 LL(1) parsers operate in linear time
 an {free grammar where each alternative
expansion for A begins with a distinct terminal
is a simple LL(1) grammar

Not all grammars are LL(1)


 S ::= aS j a
is not LL(1)
FIRST(aS ) = FIRST(a) = fag

S ::= aS 0
 S 0 ::= aS 0 j 
accepts the same language and is LL(1)

CPSC 434 Lecture 7, Page 15


LL grammars
LL(1) grammars
 may need to rewrite grammar
(left recursion, left factoring)
 resulting grammar larger, less maintainable
LL(k) grammars
 k-token lookahead
 more powerful than LL(1) grammars
 example:
S ::= ac j abc is LL(2)
Not all grammars are LL(k)
 example:
S ::= a bi j
where i  j
 equivalent to dangling else problem
 problem - must choose production after k tokens
of lookahead
Bottom-up parsers avoid this problem
CPSC 434 Lecture 7, Page 16

You might also like