Professional Documents
Culture Documents
Bottom-up parsing
Recall
Goal:
Compiler Construction 1
CA448
Bottom-Up Parsing
Compiler Construction 1
CA448
Example
S
A
B
aABe
Abc
b
d
Bottom-Up Parsing
Handles
Sentential Form
a b bcde
a Abc de
aA d e
aABe
S
Formally:
a handle of a right-sentential form is a production
A and a position in where may be found
and replaced by A to produce the previous rightsentential form in a rightmost derivation of
i.e., if S rm Aw rm w then A in
the position following is a handle of w
Because is a right-sentential form, the substring to
the right of a handle contains only terminal symbols.
Compiler Construction 1
Compiler Construction 1
Handles
Handles
Theorem:
4. a unique handle A
Compiler Construction 1
CA448
Bottom-Up Parsing
Compiler Construction 1
CA448
Bottom-Up Parsing
Example
Handle-pruning
1
2
3
4
5
6
7
8
9
<goal>
<expr>
<term>
<factor>
Prodn.
1
3
5
9
7
8
4
7
9
Compiler Construction 1
::=
::=
|
|
::=
|
|
::=
|
<expr>
<expr> + <term>
<expr> - <term>
<term>
<term> * <factor>
<term> / <factor>
<factor>
num
id
Sentential Form
<goal>
<expr>
<expr> - <term>
<expr> - <term> * <factor>
<expr> - <term> * id
<expr> - <factor> * id
<expr> - num * id
<term> - num * id
<factor> - num * id
id - num * id
Compiler Construction 1
Stack implementation
Example: back to x - 2 * y
1
2
3
4
5
6
7
8
9
Compiler Construction 1
CA448
Bottom-Up Parsing
<goal>
<expr>
<term>
<factor>
::=
::=
|
|
::=
|
|
::=
|
<expr>
<expr> + <term>
<expr> - <term>
<term>
<term> * <factor>
<term> / <factor>
<factor>
num
id
Stack
$
$id
$<factor>
$<term>
$<expr>
$<expr> $<expr> $<expr> $<expr> $<expr> $<expr> $<expr> $<expr> $<expr>
$<goal>
id num
<factor>
<term>
<term> *
<term> * id
<term> * <factor>
<term>
Compiler Construction 1
Input
Action
*
*
*
*
*
*
*
*
*
shift
reduce
reduce
reduce
shift
shift
reduce
reduce
shift
shift
reduce
reduce
reduce
reduce
accept
id
id
id
id
id
id
id
id
id
id
9
7
4
8
7
9
5
3
1
10
CA448
Shift-reduce parsing
num
num
num
num
num
num
Bottom-Up Parsing
LR parsing
push s0
token = next_token()
repeat forever
s = top of stack
if action[s,token] == " shift si" then
push si
token = next_token()
else if action[s,token] == " reduce A "
then
pop | | states
s = top of stack
push goto[s , A]
else if action[s, token] == " accept" then
return
else error()
11
Compiler Construction 1
12
Example tables
state
id
s4
s4
s4
0
1
2
3
4
5
6
7
8
1
2
3
4
5
6
ACTION
+ *
s5
r5 s6
r6 r6
r4
acc
r3
r5
r6
r2
r4
GOTO
<expr> <term> <factor>
1
2
3
7
2
3
8
3
The Grammar
::= <expr>
::= <term> + <expr>
|
<term>
<term>
::= <factor> * <term>
|
<factor>
<factor> ::= id
<goal>
<expr>
4
3
8
4
3
2
7
Input
id * id + id$
* id + id$
* id + id$
id + id$
+ id $
+ id $
+ id $
+ id $
id $
$
$
$
$
$
Action
s4
r6
s6
s4
r6
r5
r4
s5
s4
r6
r5
r3
r2
acc
CA448
13
Bottom-Up Parsing
Compiler Construction 1
14
CA448
Bottom-Up Parsing
LR parsing
used to be everyones favourite parser (but topdown is making a comeback with JavaCC)
virtually all context-free programming language
constructs can be expressed in an LR(1) form
LR grammars are the most general grammars
parsable by a deterministic, bottom-up parser
ecient parsers can be implemented for LR(1)
grammars
LR parsers detect an error as soon as possible in
a left-to-right scan of the input
LR grammars describe a proper superset of the
languages recognized by predictive (i.e., LL)
parsers
1. SLR(1)
smallest class of grammars
smallest tables (number of states)
simple, fast construction
2. LR(1)
full set of LR(1) grammars
largest tables (number of states)
slow, large construction
3. LALR(1)
intermediate sized set of grammars
same number of states as SLR(1)
canonical construction is slow and large
better construction techniques exist
15
Compiler Construction 1
16
LR(k) items
Example
CA448
Compiler Construction 1
17
Bottom-Up Parsing
2. [A X Y Z]
3. [A XY Z]
4. [A XY Z]
Compiler Construction 1
CA448
18
Bottom-Up Parsing
closure0
Given an item [A B], its closure contains
the item and any other items that can generate legal
substrings to follow .
Thus, if the parser has viable prex on its stack,
the input should reduce to B (or for some other
item [B ] in the closure).
function closure0(I)
repeat
if [A B ] I
add [B ] to I
until no more items can be added to I
return I
Compiler Construction 1
19
Compiler Construction 1
20
goto0
[A X ] such that [A X] I
If I is the set of valid items for some viable prex
, then GOTO(I, X) is the set of valid items for the
viable prex X.
GOTO(I, X) represents state after recognizing X
in state I.
function goto0(I, X)
let J be the set of items [A X ]
such that [A X ] I
return closure0(J)
Compiler Construction 1
21
CA448
Bottom-Up Parsing
$ represents EOF
To compute the collection of sets of LR(0) items
function items(G )
s0 = closure0({[S S$]})
S = {s0}
repeat
for each set of items s S
for each grammar symbol X
if goto0(s, X) = and goto0(s, X)
/ S
add goto0(s, X) to S
until no more item sets can be added to S
return S
Compiler Construction 1
CA448
22
Bottom-Up Parsing
LR(0) example
1
2
3
4
5
S
E
T
S E$
I4
E E + T
I5
I6
E T
T id
T (E)
I1 : S E$
E E +T
I2 : S E$
I7
I3 : E E + T
T id
I8
I9
T (E)
The corresponding CFSM:
I0 :
E$
E+T
T
id
(E)
E E + T
T id
T (E)
E E + T
E T
T id
T (E)
T (E)
E E +T
T (E)
E T
:
:
:
:
:
:
9
T
(
id
0
E
id
+
Compiler Construction 1
7
)
T
4
(
E
(
+
$
2
T
id
23
Compiler Construction 1
24
LR(0) example
If the LR(0) parsing table contains any multiplydened ACTION entries then G is not LR(0)
9
T
(
id
0
E
1
5
id
state
0
1
2
3
4
5
6
7
8
9
id
s5
acc
s5
r2
r4
s5
r5
r3
6
E
ACTION
(
)
+
s6
s3
acc acc acc
s6
r2
r2
r2
r4
r4
r4
s6
s8
s3
r5
r5
r5
r3
r3
r3
7
)
T
4
(
+
$
2
T
id
s2
acc
r2
r4
r5
r3
GOTO
E T
1 9
4
7 9
Compiler Construction 1
CA448
25
Bottom-Up Parsing
Compiler Construction 1
26
CA448
Bottom-Up Parsing
1
2
3
4
5
Compiler Construction 1
27
S
E
T
E$
E+T
T
id
(E)
(
id
0
E
id
+
T
id
7
)
$
2
(
+
id
s5
s5
s5
Compiler Construction 1
(
s6
s6
s6
ACTION
) +
s3
r2 r2
r4 r4
s8 s3
r5 r5
r3 r3
acc
r2
r4
r5
r3
GOTO
E T
1 9
4
7 9
28
S
E
T
F
E$
E+T
T
T F
F
id
(E)
FOLLOW
{+,),$}
{+,*,),$}
{+,*,),$}
E
T
F
state
0
1
2
3
4
5
6
7
8
9
10
11
12
I1 :
I2 :
I3 :
I4 :
I5 :
S E $
E E + T
E T
T T F
T F
F id
F (E)
S E$
E E +T
S E$
E E + T
T T F
T F
F id
F (E)
T F
F id
I6 :
I7 :
I8 :
I9 :
I10 :
I11 :
I12 :
F
E
E
T
T
F
F
E
T
T
F
F
T
F
E
T
F
E
(E)
E + T
T
T F
F
id
(E)
T
T F
T F
id
(E)
T F
(E)
E + T
T F
(E)
E +T
Compiler Construction 1
29
CA448
Bottom-Up Parsing
s3
r5
r6
r3
r4
r7
r2
s3
r5
r6
s8
r4
r7
s8
ACTION
id (
)
s5 s6
s5 s6
r5
r6
s5 s6
r3
s5 s6
r4
r7
r2
s10
acc
r5
r6
r3
r4
r7
r2
Compiler Construction 1
30
CA448
GOTO
E
T
1
7
11
12 7
Bottom-Up Parsing
LR(1) items
An LR(1) item is one in which
Consider:
S
L
R
L=R
R
R
id
L
I1 :
I2 :
I3 :
S S$
S L = R
S R
LR
L id
R L
S S $
S L = R
R L
S R
I4 :
I5 :
I6 :
I7 :
I8 :
I9 :
LR
R L
LR
L id
L id
S L = R
R L
LR
L id
L R
R L
S L = R
Consider I2:
= FOLLOW(R) (S L = R R = R)
RHS
Compiler Construction 1
31
Compiler Construction 1
32
F
4
closure1(I)
goto1(I)
Compiler Construction 1
CA448
33
Bottom-Up Parsing
Compiler Construction 1
CA448
34
Bottom-Up Parsing
35
36
L=R
R
R
id
L
I0 :
I1 :
I2 :
I3 :
I4 :
S S ,
S L = R,
S R,
L R,
L id,
R L,
L R,
L id,
S S,
S L = R,
R L,
S R,
L R,
R L,
L R,
L id,
$
$
$
=
=
$
$
$
$
$
$
$
=$
=$
=$
=$
L id ,
S L = R,
R L,
L R,
L id,
L R,
R L,
S L = R,
R L,
L R,
R L,
L R,
L id,
L id ,
L R,
I5 :
I6 :
I7 :
I8 :
I9 :
I10 :
I11 :
I12 :
I13 :
=$
$
$
$
$
=$
=$
$
$
$
$
$
$
$
$
S
E
1
2
3
4
5
6
7
T
F
E
E+T
T
T F
F
id
(E)
S E ,
E E + T ,
E T ,
T T F ,
T F ,
F id,
F (E),
$
+$
+$
*+$
*+$
*+$
*+$
I0 : shifting (
S (E),
E E + T ,
E T ,
T T F ,
T F ,
F id,
F (E),
*+$
+)
+)
*+)
*+)
*+)
*+)
I0 : shifting (
S (E),
E E + T ,
E T ,
T T F ,
T F ,
F id,
F (E),
*+)
+)
+)
*+)
*+)
*+)
*+)
37
CA448
Bottom-Up Parsing
Compiler Construction 1
38
CA448
Another example
Bottom-Up Parsing
LALR(1) parsing
Consider:
0
1
2
3
S
S
C
S S ,
S CC ,
C cC ,
C d,
S S,
S C C,
C cC ,
C d,
C c C,
C cC ,
C d,
C d,
S CC,
C c C,
C cC ,
C d,
C d,
C cC,
C cC,
$
$
cd
cd
$
$
$
$
cd
cd
cd
cd
$
$
$
$
$
cd
$
S
CC
cC
d
I1 :
I2 :
I3 :
I4 :
I5 :
I6 :
I7 :
I8 :
I9 :
Compiler Construction 1
ACTION
c
d
$
s3 s4
acc
s6 s7
s3 s4
r3 r3
r1
s6 s7
r3
r2 r2
r2
GOTO
S C
1 2
5
8
9
39
Compiler Construction 1
40
Compiler Construction 1
41
CA448
Bottom-Up Parsing
Example
Compiler Construction 1
42
CA448
Bottom-Up Parsing
Reconsider:
0
1
2
3
S
S
C
S
CC
cC
d
I0 : S S ,
S CC ,
C cC ,
C d,
I1 : S S,
I2 : S C C ,
C cC ,
C d,
I3 : C c C ,
C cC ,
C d,
I4 : C d,
I5 : S CC,
I6 : C c C ,
C cC ,
C d,
I7 : C d,
I8 : C cC,
I9 : C cC,
Compiler Construction 1
$
$
cd
cd
$
$
$
$
cd
cd
cd
cd
$
$
$
$
$
cd
$
Advantages:
Merged states:
I36 : C
C
C
I47 : C
I89 : C
state
0
1
2
36
47
5
89
c C,
cC ,
d,
d,
cC,
cd$
cd$
cd$
cd$
cd$
ACTION
c
d
$
s36 s47
acc
s36 s47
s36 s47
r3
r3
r3
r1
r2
r2
r2
43
Compiler Construction 1
|
|
|
|
|
|
|
EE
E/E
E+E
EE
(E)
E
id
num
44
The problem
Right Recursion:
Rule of thumb:
right recursion for top-down parsers
left recursion for bottom-up parsers
Compiler Construction 1
45
CA448
Bottom-Up Parsing
Parsing review
Recursive descent
A hand coded recursive descent parser directly
encodes a grammar (typically an LL(1) grammar)
into a series of mutually recursive procedures. It has
most of the linguistic limitations of LL(1).
LL(k)
An LL(k) parser must be able to recognize the use
of a production after seeing only the rst k symbols
of its right hand side.
LR(k)
An LR(k) parser must be able to recognize the
occurrence of the right hand side of a production
after having seen all that is derived from that right
hand side with k symbols of lookahead.
Compiler Construction 1
47
Compiler Construction 1
46