Compiler Design

YANG YANG 1
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

LL(1) left-to-right scanning
leftmost derivation
1-token lookahead
parser generator:
Parsing becomes the easiest!
Modifying parsers
is also convenient.
YANG YANG 2
Given the productions
A e
1
A e
2
.....
A e
n
During a (leftmost) derivation,
... A ... ... e
1
... or
... e
2
... or
... e
n
...
Which route should we choose?
(Try-and-error is not a good idea.)
Use the lookahead symbols.
YANG YANG 3
Consider the situation:
We are about to expand a nonterminal
A and there are several productions
whose LHS are A:
A e
1
A e
2
.....
A e
n
We choose one of the productions
based on the lookahead token.
Which one should we choose?
Consider First(e
1
)
First(e
2
)
......
First(e
n
)
and
if e
i
, then consider also Follow(A).
*
YANG YANG 4
Define
predict(A e)
=First(e) (if First(e) then Follow(A))
If the lookahead token a predict(Ae)
then we use the production Ae to
expand A.
What if a predict(A e
1
) and
a predict(A e
2
)?
What if a Zpredict(Ae) for all
productions Ae whose LHS are A?
YANG YANG 5
Property of LL(1) grammars:
If a grammar is LL(1), then
for any two productions
A e
A 0
First(eFollow(A))
First(0Follow(A)) = o
YANG YANG 6
Figure 5.1
A Micro grammar in standard form
Given the FIRST and FOLLOW sets in Fig. 5-2
and 5-3, calculate the predict
set for each production.
YANG YANG 7
5.2 LL(1) Parse Table
The predict() function may be
represented as an LL(1) parse table.
T: Vn * Vt P {error}
a b ......
A 3
B error
....
T[A, a] = Ae if a predict(Ae)
= error otherwise
A grammar is LL(1) iff all entries in the
parse table contain a unique
production or the error flag.
YANG YANG 8
Figure 5.5
The LL(1) table for Micro
YANG YANG 9
5.3 LL(1) parsers
Similar to scanners, there are two
kinds of parsers:
1. built-in: recursive descent
2. table-driven
YANG YANG 10
1. built-in
stmt()
{
token = next_token();
switch(token) {
case ID:
/*production 5:stmt-->ID:=<exp>;*/
match(ID);
match(ASSIGN);
exp();
match(SEMICOLON);
break;
case READ: /*production 6*/
...
case WRITE: /*production 7*/
...
default: syntax_error(....);
}
}
YANG YANG 11
It is obvious that these recursive descent
parsing procedures can be generated
automatically from the grammar.
grammar LL(1) table
parser
generator
recursive descent
parser
However, it is difficult for the parser
generator to integrate the semantic
routines into the (generated) recursive
descent parser automatically.
YANG YANG 12
2. table-driven parser
(+) generic driver
Only the LL(1) table needs
to be changed when the
grammar is modified.
(+) non-recursive (faster)
Parser maintains a stack itself.
No recursive calls.
YANG YANG 13
lldriver()
{
push( START_SYMBOL );
a := next_token;
while stack is not empty do
{
X := symbol on stack top
if ( X is a nondeterminal &&
T[X, a] == XY
1
Y
m
))
pop(1);
push Y
m
, Y
m-1
, , Y
1
else if ( x == a )
pop(1);
a := next_token();
else if ( x is an action symbol )
pop(1);
call correspond routine
else syntax_error();
}
}
YANG YANG 14
Ex.
begin A := B - 3 + A; end $
a = begin
X = <GOAL>
<GOAL>
parse
stack
Trace the action of the parser on this example.
YANG YANG 15
5.5 Action symbols
Action symbols may be processed by
the parser in a similar way.
1. in recursive descent parsers
Ex.gen_action( ID:=<exp>#assign );)
will generate the following code:
match(ID);
match(ASSIGN);
exp();
assign();
match(semicolon);
Parameters are transmitted through a
semantic stack.
Semantic stack is a stack of semantic
records.
Parser stack is a stack of grammar
(and action) symbols.
YANG YANG 16
2. in LL(1) driver
Action symbols are pushed
into the parse stack in the
same way as
grammar symbols.
When action symbols are
on stack top, the driver calls
corresponding semantic
routines.
See previous slide for
lldriver.
Parameters are transmitted
through semantic stack.
YANG YANG 17
5.6 Making grammars LL(1)
Not all grammars are LL(1). However,
some non-LL(1) grammars can be
made LL(1) by simple modifications.
When is a grammar not LL(1)?
When there is an entry in the parse
table that contains more than one
productions.
Ex. ...... ID ......
....
<stmt> 2,5
....
This is called a conflict, which means
we do not know which production to
use when <stmt> is on stack top and
ID is the next input token.
YANG YANG 18
Conflicts are classified into two
categories:
1. common prefix
2. left recursion
Common prefix
Ex.
<stmt>if <exp> then <stmt>
<stmt>if <exp> then <stmt> else <stmt>
Consider when <stmt> is on stack
top, if is the next input token. We
cannot choose which production
to use at this time.
In general, if we have two productions
A e
A 0
and First(e) First(0) = o,
then we have a conflict.
YANG YANG 19
Solution:
factor out
the common prefix
Ex.
<stmt> if <exp>
then <stmt> <tail>
<tail>
<tail> else <stmt>
YANG YANG 20
2. left recursion:
productions of the form:
A A e
grammar with left-recursive
productions are not LL(1)
because we may have
A Ae Aee
same lookahead
YANG YANG 21
Solution: replace the productions
A A e
A 0
A
Intuition: all the strings derivable from A
have the form:
0, 0e, 0ee, 0eee,
, e, ee, eee,
So we may use the following
productions instead:
A 0 T
A T
T
T e T
Left recursion Right recursion
YANG YANG 22
Ex. Given the left-recursive grammar:
E E + T
E T
T T * P
T P
P ID
After eliminating left recursion, we get
E T A
A
A + T A
T P B
B
B * P B
P ID
YANG YANG 23
3. more general solution
ex.
<stmt> <label> <unlabeled stmt>
<label> ID :
<label>
<unlabeled stmt> ID := <exp> ;
We cannot decide which production to
use when <label> is on the stack top
and ID is the next token:
<label> ?
<stmt> <unlabeled stmt>
lookahead lookahead
ID ID
YANG YANG 24
Solution: use the following productions
(which essentially look ahead 2 tokens)
<stmt> ID <suffix>
<suffix> : <unlabeled stmt>
<suffix> := <exp> ;
<unlabeled stmt> ID := <exp> ;
Try two examples:
A: B := C ;
B := C ;
YANG YANG 25
4. For more difficult cases, we use
semantic routines to help parsing.
Ex. In Ada, we may declare arrays as
A: array(I .. J, BOOLEAN)
A straightforward grammar is (for
array bound)
<bound> <exp> .. <exp>
<bound> ID
<exp> ID
<exp>
and ID First(<exp>)
This grammar is not LL(1) because we
cannot make a decision when <bound>
is on stack top and ID is the next token.
YANG YANG 26
Solution:
<bound> <exp> <tail>
<tail>
<tail> .. <exp>
All grammars can be transformed into
Greibach Normal Form, in which a
production has the form:
A a e
terminal
So given a grammar G, we can do
G GNF no common prefix
no left recursion
but still NOT LL(1)!
Ex. S a A a
S b A b a
A b
A
consider A is on stack top; b is next token.
YANG YANG 27
5.7 The dangling-else problem
Consider
if a then if b then x := 1 else x := 2
Two possibilities:
a a
T T F
b b
T F T x := 2
x := 1 x := 2 x := 1
The problem is which if the else
belong to.
In essence, we are trying to find an
LL(1) grammar for the set
{ [
i
]
j
| i u j u 0}
But is it possible?
YANG YANG 28
1st attempt: G1
S [ S C
S
C ]
C
This grammar is ambiguous.
Consider [ [ ]
S S
[ S C [ S C
[ S C [ S C ]
]
YANG YANG 29
2nd attempt: we can make ] be
associated with the nearest unpaired [
as follows:
S [ S
S T
T [ T ]
T
This grammar is not ambiguous.
Consider [ [ ]
S
[ S
[ T ]
However, this grammar is not LL(1),
either. Consider the case when S is on
stack top and [ is the next input token.
[ First( [ S )
[ First( T )
This grammar can be parsed with a
bottom-up parser, but not a top-down
parser.
YANG YANG 30
Solution: conflicts + special rules
1. G S ;
2. S if S E
3. S other
4. E else S
5. E
The parse table if else other ;
G 1 1
S 2 3
E 4,5 5
conflicts
We can enforce that T[E, else] = 4th rule.
This essentially forces else to be
matched with the nearest unpaired if.
YANG YANG 31
Alternative solution: change the
language.
Add end if at the end of every if.
S if S E
S other
E else S end if
E end if
YANG YANG 32
5.9 Properties of LL(1) parsers:
A correct leftmost parse
is guaranteed.
All LL(1) grammars are
un-ambiguous.
linear time and linear space
YANG YANG 33
llgen
Page 776 of the book
output from llgen
*define
decrtn 1
ifprocess 2
YANG YANG 34
LL(k) parsing
Recall a grammar is LL(1) only if
for any two productions A e
and A 0,
First(eFollow(A)) First(0Follow(A)) = o
To generalize, we write
for any two productions Ae and A0,
First
k
(eFollow
k
(A))
First
k
(0Follow
k
(A)) = o
if G is strong LL(k).
The word strong means G imposes
too strong a condition.
YANG YANG 35
Consider
G S $
S a A a
S b A b a
A b
A
This grammar is not LL(1)
When A is on stack top and b is next
token, we cannot choose between
A b and A .
stack input
b .....
A
......
-- Does it help if we can look ahead two
tokens?
NO! if the next two tokens are bb
then we should choose A b.
if the next two tokens are ba
then we cannot make a choice.
YANG YANG 36
case 1. input is aba
a a
A A
S a a
G $ $ $
lookahead match lookahead
ab a ba
at this point,
we should
choose Ab
case 2. input is bba
a
b a
A b
S b b
G $ $ $
lookahead match lookahead
bb b ba
at this point,
we should
choose A
YANG YANG 37
So the problem is not the limited number
of lookahead tokens.
The problem is in the context.
YANG YANG 38
Therefore, the grammar is not strong
LL(1).
Actually, we can verify that the
grammar is not strong LL(k) for all ku1
by verify that
First
k
( ba$ ) First
k
( bFollow
k
(A) )
First
k
( Follow
k
(A) )
for all ku1
YANG YANG 39
However, it is possible to parse the
language of the grammar under the
following conditions:
1. look ahead two tokens
2. from left to right
3. using the left context
We call such grammars LL(2), rather
than strong LL(2).
Note that LL(2) = strong LL(2)
LL(1) = strong LL(1)
YANG YANG 40
LL(k) parsers:
Each nonterminal A [A,L
1
]
[A,L
2
]
.......
where L
i
is a set of terminal strings
of length ! k
Let [A,L] be the nonterminal on top of
stack. Let z be the lookahead (|z|=k).
At this point, we choose production
Ae only if z First( ey ) for some yL.
Note. If there exists a state [A,L] and
two productions Ae,A0 such that
First
k
( ey ) & First
k
( 0y ) = o
yL yL
then the grammar is not LL(k).
YANG YANG 41
When [A,L] is the state on stack top,
assume we choose the production Ae
Let e = X
0
[B
1
,L
1
]X
1
[B
m
,L
m
]X
m,
where X
i
are terminal strings and
B
i
are nonterminal.
Pop [A,L] from stack. Push
X
0
[B
1
,L
1
]X
1
[B
m
,L
m
]X
m
onto stack, where
L
i
= First
k
( X
i
B
i+1
X
i+1
...B
m
X
m
y )
yL
The start symbol is [S,{}]
[A,L] A X
i
B
i+1
X
i+1
...B
m
X
m
y
YANG YANG 42
Ex. G S $
S a A a
S b A b a
A b
A
1. First
2
(A) = { b, }
First
2
(S) = { ab, aa, bb }
First
2
(G) = First
2
(S) = { ab, aa, bb }
2. [G,{}] is the start symbol.
3. Consider the production G S $
z First
2
( S$ ) = { ab, aa, bb }
L
1
= First
2
( $ ) = {$}
ab,aa,bb predicts
This means [G,{}] [S,{$}]
YANG YANG 43
4. Consider the production S a A a.
[S,{$}]
z First
2
( aAa$ ) = { ab, aa }
L
1
= First
2
( a$ ) = { a$ }
ab,aa
This means [S,{$}] a[A,{a$}]a
5. Consider the production S b A b a
[S,{$}]
z First
2
( bAba$ ) = { bb }
L
1
= First
2
( ba$ ) = { ba }
bb
This means [S,{$}] b[A,{ba}]ba
6. Consider A b [A,{a$}]
z First
2
( ba$ ) = { ba }
No L
1
ba
This means [A,{a$}] b
YANG YANG 44
7. Consider A b [A,{ba}]
z First
2
( bba ) = { bb }
No L
1
bb
This means [A,{ba}] b
8. Consider A [A,{a$}]
z First
2
( a$ ) = { a$ }
No L
1
a$
This means [A,{a$]]
9. Consider A [A,{ba}]
z First
2
( ba ) = { ba }
No L
1
ba
This means [A,{ba}]
YANG YANG 45
In summary, we have 7 productions:
ab,aa,bb
[G,{}] [S,{$}]$
ab,aa
[S,{$}] a[A,{a$}]a
bb
[S,{$}] b[A,{ba}]ba
ba
[A,{a$}] b
a$
[A,{a$}]
bb
[A,{ba}] b
ba
[A,{ba}]
Note that there is no conflict on the look-
aheads. Therefore, the grammar is LL(2).
YANG YANG 46
Now lets parse the string abba$
[G,{}] [S,{$}] $
a [A,{a$}] a $ match a
fail
Parse bbba$
[G,{}] [S,{$}] $
b [A,{ba}] b a $ match 1st b
b b b a $ match 2nd b
match 3rd b
match a
Do you DARE to try exercise 11 on
page 139?
YANG YANG 47
Some results:
LL(k) LL(k+1)
strong LL(k) strong LL(k+1)
strong LL(k) LL(k) for all k>1
strong LL(1) = LL(1)
L
k
= { a
n
(b,b
k
d)n | n u 1 } needs k-token
lookahead.
Strong LL(1)s table is larger.
error detection

Compiler Design

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Design

Uploaded by

Copyright:

Available Formats

YANG YANG 1

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

You might also like