# YANG YANG 1

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 LL(1) left-to-right scanning
leftmost derivation
 parser generator:
Parsing becomes the easiest!
Modifying parsers
is also convenient.
YANG YANG 2
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Given the productions
A ÷e
1
A ÷e
2
.....
A ÷e
n
During a (leftmost) derivation,
... A ... ÷... e
1
... or
÷... e
2
... or
÷... e
n
...
Which route should we choose?
(Try-and-error is not a good idea.)
YANG YANG 3
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Consider the situation:
We are about to expand a nonterminal
A and there are several productions
whose LHS are A:
A ÷e
1
A ÷e
2
.....
A ÷e
n
We choose one of the productions
Which one should we choose?
Consider First(e
1
)
First(e
2
)
......
First(e
n
)
and
if e
i
÷ì, then consider also Follow(A).
*
YANG YANG 4
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Define
predict(A ÷e)
=First(e) (if ì First(e) then Follow(A))
 If the lookahead token a predict(A÷e)
then we use the production A÷e to
expand A.
 What if a predict(A ÷e
1
) and
a predict(A ÷e
2
)?
 What if a Zpredict(A÷e) for all
productions A÷e whose LHS are A?
YANG YANG 5
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Property of LL(1) grammars:
If a grammar is LL(1), then
for any two productions
A ÷ e
A ÷ 0
First(eFollow(A)) ·
First(0Follow(A)) = o
YANG YANG 6
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Figure 5.1
A Micro grammar in standard form
Given the FIRST and FOLLOW sets in Fig. 5-2
and 5-3, calculate the predict
set for each production.
YANG YANG 7
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§5.2 LL(1) Parse Table
 The predict() function may be
represented as an LL(1) parse table.
T: Vn * Vt P {error}
a b ......
A 3
B error
....
T[A, a] = A÷e if a predict(A÷e)
= error otherwise
 A grammar is LL(1) iff all entries in the
parse table contain a unique
production or the error flag.
YANG YANG 8
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Figure 5.5
The LL(1) table for Micro
YANG YANG 9
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
5.3 LL(1) parsers
 Similar to scanners, there are two
kinds of parsers:
1. built-in: recursive descent
2. table-driven
YANG YANG 10
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
1. built-in
stmt()
{
token = next_token();
switch(token) {
case ID:
/*production 5:stmt-->ID:=<exp>;*/
match(ID);
match(ASSIGN);
exp();
match(SEMICOLON);
break;
...
case WRITE: /*production 7*/
...
default: syntax_error(....);
}
}
YANG YANG 11
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
It is obvious that these recursive descent
parsing procedures can be generated
automatically from the grammar.
grammar LL(1) table
parser
generator
recursive descent
parser
 However, it is difficult for the parser
generator to integrate the semantic
routines into the (generated) recursive
descent parser automatically.
YANG YANG 12
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
2. table-driven parser
(+) generic driver
Only the LL(1) table needs
to be changed when the
grammar is modified.
(+) non-recursive (faster)
Parser maintains a stack itself.
No recursive calls.
YANG YANG 13
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
lldriver()
{
push( START_SYMBOL );
a := next_token;
while stack is not empty do
{
X := symbol on stack top
if ( X is a nondeterminal &&
T[X, a] == X÷Y
1
Y
m
)¶)
pop(1);
push Y
m
, Y
m-1
, , Y
1
else if ( x == a )
pop(1);
a := next_token();
else if ( x is an action symbol )
pop(1);
call correspond routine
else syntax_error();
}
}
YANG YANG 14
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Ex.
begin A := B - 3 + A; end \$
a = begin
X = <GOAL>
<GOAL>
parse
stack
Trace the action of the parser on this example.
YANG YANG 15
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
5.5 Action symbols
 Action symbols may be processed by
the parser in a similar way.
1. in recursive descent parsers
Ex.gen_action( ³ID:=<exp>#assign´ );´)
will generate the following code:
match(ID);
match(ASSIGN);
exp();
assign();
match(semicolon);
 Parameters are transmitted through a
semantic stack.
 Semantic stack is a stack of semantic
records.
 Parser stack is a stack of grammar
(and action) symbols.
YANG YANG 16
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
2. in LL(1) driver
 Action symbols are pushed
into the parse stack in the
same way as
grammar symbols.
 When action symbols are
on stack top, the driver calls
corresponding semantic
routines.
 See previous slide for
lldriver.
Parameters are transmitted
through semantic stack.
YANG YANG 17
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§5.6 Making grammars LL(1)
 Not all grammars are LL(1). However,
some non-LL(1) grammars can be
 When is a grammar not LL(1)?
When there is an entry in the parse
table that contains more than one
productions.
Ex. ...... ID ......
....
<stmt> 2,5
....
This is called a conflict, which means
we do not know which production to
use when <stmt> is on stack top and
ID is the next input token.
YANG YANG 18
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Conflicts are classified into two
categories:
1. common prefix
2. left recursion
 Common prefix
Ex.
<stmt>÷if <exp> then <stmt>
<stmt>÷if <exp> then <stmt> else <stmt>
Consider when <stmt> is on stack
top, µif¶ is the next input token. We
cannot choose which production
to use at this time.
In general, if we have two productions
A ÷ e
A ÷ 0
and First(e) ·First(0) = o,
then we have a conflict.
YANG YANG 19
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Solution:
factor out
the common prefix
Ex.
<stmt> ÷if <exp>
then <stmt> <tail>
<tail> ÷
<tail> ÷else <stmt>
YANG YANG 20
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
2. left recursion:
productions of the form:
A ÷A e
 grammar with left-recursive
productions are not LL(1)
because we may have
A ÷Ae ÷Aee ÷
YANG YANG 21
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Solution: replace the productions
A ÷A e
A ÷ 0
A ÷ ¸
Intuition: all the strings derivable from A
have the form:
0, 0e, 0ee, 0eee,
¸, ¸e, ¸ee, ¸eee,
So we may use the following
A ÷ 0 T
A ÷ ¸ T
T ÷
T ÷ e T
Left recursion Right recursion
YANG YANG 22
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Ex. Given the left-recursive grammar:
E ÷E + T
E ÷T
T ÷T * P
T ÷P
P ÷ID
After eliminating left recursion, we get
E ÷T A
A ÷
A ÷+ T A
T ÷P B
B ÷
B ÷* P B
P ÷ID
YANG YANG 23
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
3. more general solution
ex.
<stmt> ÷<label> <unlabeled stmt>
<label> ÷ID :
<label> ÷
<unlabeled stmt> ÷ID := <exp> ;
We cannot decide which production to
use when <label> is on the stack top
and ID is the next token:
<label> ?
<stmt> <unlabeled stmt>
ID ID
YANG YANG 24
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Solution: use the following productions
(which essentially look ahead 2 tokens)
<stmt> ÷ID <suffix>
<suffix> ÷: <unlabeled stmt>
<suffix> ÷:= <exp> ;
<unlabeled stmt> ÷ID := <exp> ;
Try two examples:
A: B := C ;
B := C ;
YANG YANG 25
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
4. For more difficult cases, we use
semantic routines to help parsing.
Ex. In Ada, we may declare arrays as
A: array(I .. J, BOOLEAN)
A straightforward grammar is (for
array bound)
<bound> ÷<exp> .. <exp>
<bound> ÷ID
<exp> ÷ID
<exp> ÷ «
and ID First(<exp>)
This grammar is not LL(1) because we
cannot make a decision when <bound>
is on stack top and ID is the next token.
YANG YANG 26
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Solution:
<bound> ÷<exp> <tail>
<tail> ÷
<tail> ÷ .. <exp>
 All grammars can be transformed into
Greibach Normal Form, in which a
production has the form:
A ÷a e
terminal
So given a grammar G, we can do
G ÷GNF ÷no common prefix
no left recursion
but still NOT LL(1)!
Ex. S ÷a A a
S ÷b A b a
A ÷b
A ÷
consider A is on stack top; b is next token.
YANG YANG 27
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§5.7 The dangling-else problem
 Consider
if a then if b then x := 1 else x := 2
Two possibilities:
a a
T T F
b b
T F T x := 2
x := 1 x := 2 x := 1
The problem is which µ if¶ the µ else¶
belong to.
 In essence, we are trying to find an
LL(1) grammar for the set
{ [
i
]
j
| i u j u 0}
But is it possible?
YANG YANG 28
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 1st attempt: G1
S ÷[ S C
S ÷
C ÷]
C ÷
This grammar is ambiguous.
Consider [ [ ]
S S
[ S C [ S C
[ S C [ S C ]
]
YANG YANG 29
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 2nd attempt: we can make ] be
associated with the nearest unpaired [
as follows:
S ÷[ S
S ÷T
T ÷[ T ]
T ÷
This grammar is not ambiguous.
Consider [ [ ]
S
[ S
[ T ]
However, this grammar is not LL(1),
either. Consider the case when S is on
stack top and [ is the next input token.
[ First( [ S )
[ First( T )
This grammar can be parsed with a
bottom-up parser, but not a top-down
parser.
YANG YANG 30
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Solution: conflicts + special rules
1. G ÷S ;
2. S ÷if S E
3. S ÷other
4. E ÷else S
5. E ÷
The parse table if else other ;
G 1 1
S 2 3
E 4,5 5
conflicts
We can enforce that T[E, else] = 4th rule.
This essentially forces µelse¶ to be
matched with the nearest unpaired µif¶.
YANG YANG 31
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Alternative solution: change the
language.
 Add µend if¶ at the end of every µif¶.
S ÷if S E
S ÷other
E ÷else S end if
E ÷end if
YANG YANG 32
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§5.9 Properties of LL(1) parsers:
 A correct leftmost parse
is guaranteed.
 All LL(1) grammars are
un-ambiguous.
 linear time and linear space
YANG YANG 33
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§ llgen
Page 776 of the book
output from llgen
*define
decrtn 1
ifprocess 2
YANG YANG 34
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
§ LL(k) parsing
 Recall a grammar is LL(1) only if
for any two productions A ÷ e
and A ÷ 0,
First(eFollow(A)) · First(0Follow(A)) = o
 To generalize, we write
for any two productions A÷e and A÷0,
First
k
(eFollow
k
(A))
· First
k
(0Follow
k
(A)) = o
if G is strong LL(k).
 The word µstrong¶ means G imposes
too strong a condition.
YANG YANG 35
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Consider
G ÷S \$
S ÷a A a
S ÷b A b a
A ÷b
A ÷
± This grammar is not LL(1)
When A is on stack top and b is next
token, we cannot choose between
A ÷b and A ÷.
stack input
b .....
A
......
-- Does it help if we can look ahead two
tokens?
NO! if the next two tokens are bb
then we should choose A ÷b.
if the next two tokens are ba
then we cannot make a choice.
YANG YANG 36
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
case 1. input is aba
a a
A A
S a a
G \$ \$ \$
ab a ba
at this point,
we should
choose A÷b
case 2. input is bba
a
b a
A b
S b b
G \$ \$ \$
bb b ba
at this point,
we should
choose A÷
YANG YANG 37
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
So the problem is not the limited number
The problem is in the µcontext¶.
YANG YANG 38
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Therefore, the grammar is not strong
LL(1).
 Actually, we can verify that the
grammar is not strong LL(k) for all ku1
by verify that
First
k
( ba\$ ) First
k
( bFollow
k
(A) )
· First
k
( ìFollow
k
(A) )
for all ku1
YANG YANG 39
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 However, it is possible to parse the
language of the grammar under the
following conditions:
2. from left to right
3. using the left context
We call such grammars LL(2), rather
than strong LL(2).
 Note that LL(2) = strong LL(2)
LL(1) = strong LL(1)
YANG YANG 40
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
LL(k) parsers:
 Each nonterminal A [A,L
1
]
[A,L
2
]
.......
where L
i
is a set of terminal strings
of length ! k
 Let [A,L] be the nonterminal on top of
stack. Let z be the lookahead (|z|=k).
At this point, we choose production
A÷e only if z First( ey ) for some yL.
Note. If there exists a state [A,L] and
two productions A÷e,A÷0 such that
First
k
( ey ) & First
k
( 0y ) = o
yL yL
then the grammar is not LL(k).
YANG YANG 41
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 When [A,L] is the state on stack top,
assume we choose the production A÷e
Let e = X
0
[B
1
,L
1
]X
1
[B
m
,L
m
]X
m,
where X
i
are terminal strings and
B
i
are nonterminal.
Pop [A,L] from stack. Push
X
0
[B
1
,L
1
]X
1
[B
m
,L
m
]X
m
onto stack, where
L
i
= First
k
( X
i
B
i+1
X
i+1
...B
m
X
m
y )
yL
 The start symbol is [S,{ì}]
[A,L] A ÷X
i
B
i+1
X
i+1
...B
m
X
m
y
YANG YANG 42
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Ex. G ÷S \$
S ÷a A a
S ÷b A b a
A ÷b
A ÷
1. First
2
(A) = { b, ì }
First
2
(S) = { ab, aa, bb }
First
2
(G) = First
2
(S) = { ab, aa, bb }
2. [G,{ì}] is the start symbol.
3. Consider the production G ÷S \$
z First
2
( S\$ì ) = { ab, aa, bb }
L
1
= First
2
( \$ì ) = {\$}
ab,aa,bb predicts
This means [G,{ì}] [S,{\$}]
YANG YANG 43
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
4. Consider the production S ÷a A a.
[S,{\$}]
z First
2
( aAa\$ ) = { ab, aa }
L
1
= First
2
( a\$ ) = { a\$ }
ab,aa
This means [S,{\$}] a[A,{a\$}]a
5. Consider the production S ÷b A b a
[S,{\$}]
z First
2
( bAba\$ ) = { bb }
L
1
= First
2
( ba\$ ) = { ba }
bb
This means [S,{\$}] b[A,{ba}]ba
6. Consider A ÷b [A,{a\$}]
z First
2
( ba\$ ) = { ba }
No L
1
ba
This means [A,{a\$}] b
YANG YANG 44
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
7. Consider A ÷b [A,{ba}]
z First
2
( bba ) = { bb }
No L
1
bb
This means [A,{ba}] b
8. Consider A ÷ [A,{a\$}]
z First
2
( a\$ ) = { a\$ }
No L
1
a\$
This means [A,{a\$]]
9. Consider A ÷ [A,{ba}]
z First
2
( ba ) = { ba }
No L
1
ba
This means [A,{ba}]
YANG YANG 45
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
In summary, we have 7 productions:
ab,aa,bb
[G,{ì}] [S,{\$}]\$
ab,aa
[S,{\$}] a[A,{a\$}]a
bb
[S,{\$}] b[A,{ba}]ba
ba
[A,{a\$}] b
a\$
[A,{a\$}]
bb
[A,{ba}] b
ba
[A,{ba}]
Note that there is no conflict on the look-
aheads. Therefore, the grammar is LL(2).
YANG YANG 46
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
Now let¢s parse the string abba\$
[G,{ì}] ÷[S,{\$}] \$
÷a [A,{a\$}] a \$ match a
÷ fail
Parse bbba\$
[G,{ì}] ÷[S,{\$}] \$
÷b [A,{ba}] b a \$ match 1st b
÷b b b a \$ match 2nd b
match 3rd b
match a
Do you DARE to try exercise 11 on
page 139?
YANG YANG 47
Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
 Some results:
LL(k)  LL(k+1)
strong LL(k)  strong LL(k+1)
strong LL(k)  LL(k) for all k>1
strong LL(1) = LL(1)
 L
k
= { a
n
(b,b
k
d)n | n u 1 } needs k-token