You are on page 1of 31

1

Bottom-up parsing
Goal of parser : build a derivation
Top-down parser : build a derivation by working from the
start symbol towards the input.
Builds parse tree from root to leaves
Builds leftmost derivation
Bottom-up parser : build a derivation by working from the
input back toward the start symbol
Builds parse tree from leaves to root
Builds reverse rightmost derivation
2
Bottom-up parsing
The parser looks for a substring of the parse tree's
frontier...
...that matches the rhs of a production and
...whose reduction to the non-terminal on the lhs represents
on step along the reverse of a rightmost derivation
Such a substring is called a handle.
Important:
Not all substrings that match a rhs are handles.
3
Bottom-up parsing techniques
Shift-reduce parsing
Shift input symbols until a handle is found. Then, reduce the
substring to the non-terminal on the lhs of the
corresponding production.
Operator-precedence parsing
Based on shift-reduce parsing.
Identifies handles based on precedence rules.
4
Example: Shift-reduce parsing
1. S E
2. E E + E
3. E E * E
4. E num
5. E id
Input to parse:
id
1
+ num * id
2

Accept $ S
Reduce (rule 1) $ E
Reduce (rule 2) $ E + E
Reduce (rule 3) $ E + E * E
Reduce (rule 5) $ E + E * id
2
Shift $ E + E *
Shift $ E + E
Reduce (rule 4) $ E + num

Shift $ E +
Shift $ E
Reduce (rule 5) $ id
1
Shift $
ACTION
Grammar:
Handles:
underlined
STACK
5
Shift-Reduce parsing
A shift-reduce parser has 4 actions:
Shift -- next input symbol is shifted onto the stack
Reduce -- handle is at top of stack
pop handle
push appropriate lhs
Accept -- stop parsing & report success
Error -- call error reporting/recovery routine
6
Shift-Reduce parsing
How can we know when we have found a handle?
Analyze the grammar beforehand.
Build tables
Look ahead in the input
LR(1) parsers recognize precisely those languages in
which one symbol of look-ahead is enough to
determine whether to reduce or shift.
L : for left-to-right parse of the input
R : for reverse rightmost derivation
1: for one symbol of lookahead
7
How does it work?
Read input, one token at a time
Use stack to keep track of current state
The state at the top of the stack summarizes the information
below.
The stack contains information about what has been parsed
so far.
Use parsing table to determine action based on
current state and look-ahead symbol.
How do we build a parsing table?
8
LR parsing techniques
SLR
Simple LR parsing
Easy to implement, not strong enough
Uses LR(0) items
Canonical LR
Larger parser but powerful
Uses LR(1) items
LALR
Look Ahead LR parsing
Condensed version of canonical LR
May introduce conflicts
Uses LR(1) items
9
Class examples
id F
F T
T * F T
T E
E + T E
E E'
L R
id L
* R L
R S
L = R S
S S'
10
Finding handles
As a shift/reduce parser processes the input, it must
keep track of all potential handles.
For example, consider the usual expression grammar
and the input string x+y.
Suppose the parser has processed x and reduced it to E.
Then, the current state can be represented by E +E where
means
that an E has already been parsed and
that +E is a potential suffix, which, if found, will result in
a successful parse.
Our goal is to eventually reach state E+E, which represents
an actual handle and should result in the reduction EE+E
11
LR parsing
Typically, LR parsing works by building an automaton where
each state represents what has been parsed so far and what we
hope to parse in the future.
In other words, states contain productions with dots, as described
earlier.
Such productions are called items
States containing handles (meaning the dot is all the way to the
right end of the production) lead to actual reductions depending
on the lookahead.
12
SLR parsing
SLR parsers build automata where states contain
items (a.k.a. LR(0) items) and reductions are decided
based on FOLLOW set information.
We will build an SLR table for the augmented
grammar
S'S
S L=R
S R
L *R
L id
R L
13
SLR parsing
When parsing begins, we have not parsed any input at all and
we hope to parse an S. This is represented by S'S.
Note that in order to parse that S, we must either parse an L=R or
an R. This is represented by SL=R and SR
closure of a state:
if AaBb represents the current state and B is a
production, then add B to the state.
Justification: aBb means that we hope to see a B next. But
parsing a B is equivalent to parsing a , so we can say that
we hope to see a next
14
SLR parsing
Use the closure operation to define states containing
LR(0) items. The first state will be:





From this state, if we parse, say, an id, then we go to
state
If, after some steps we parse input that reduces to
an L, then we go to state
S' S
S L=R
S R
L *R
L id
R L
L id
S L =R
R L
15
id
SLR parsing
Continuing the same way, we define all LR(0) item
states:
S' S
S L=R
S R
L *R
L id
R L
L id
S L =R
R L
S' S
I
0

I
1

I
2

I
3

S R
I
4

L * R
R L
L id
L * R
I
5

S
L
*
id R
S L= R
R L
L *R
L id
I
6

=
R
S L=R
R L
L
L
I
7

id
I
3

*
*
L *R
R
I
8

I
9

16
SLR parsing
The automaton and the FOLLOW sets tell us how to build the
parsing table:
Shift actions
If from state i, you can go to state j when parsing a
token t, then slot [i,t] of the table should contain action
"shift and go to state j", written sj
Reduce actions
If a state i contains a handle A, then slot [i, t] of the
table should contain action "reduce using A", for all
tokens t that are in FOLLOW (A). This is written r(A)
The reasoning is that if the lookahead is a symbol that
may follow A, then a reduction A should lead closer to
a successful parse.
continued on next slide
17
SLR parsing
The automaton and the FOLLOW sets tell us how to build the
parsing table:
Reduce actions, continued
Transitions on non-terminals represent several steps
together that have resulted in a reduction.
For example, if we are in state 0 and parse a bit of input
that ends up being reduced to an L, then we should go to
state 2.
Such actions are recorded in a separate part of the
parsing table, called the GOTO part.
18
SLR parsing
Before we can build the parsing table, we need to compute the
FOLLOW sets:
S' S
S L=R
S R
L *R
L id
R L
FOLLOW(S') = {$}
FOLLOW(S) = {$}
FOLLOW(L) = {$, =}
FOLLOW(R) = {$, =}
19
SLR parsing
state action goto
id = * $ S L R
0 s3 s5 1 2 4
1 accept
2 s6/r(RL)
3 r(Lid) r(Lid)
4 r(SR)
5 s3 s5 7 8
6 s3 s5 7 9
7 r(RL) r(RL)
8 r(L*R) r(L*R)
9 r(SL=R)
Note the shift/reduce conflict on state 2 when the lookahead is an =
20
Conflicts in LR parsing
There are two types of conflicts in LR parsing:
shift/reduce
On some particular lookahead it is possible to shift or reduce
The if/else ambiguity would give rise to a shift/reduce conflict
reduce/reduce
This occurs when a state contains more than one handle that
may be reduced on the same lookahead.
21
Conflicts in SLR parsing
The parser we built has a shift/reduce conflict.
Does that mean that the original grammar was
ambiguous?
Not necessarily. Let's examine the conflict:
it seems to occur when we have parsed an L and are seeing
an =. A reduce at that point would turn the L into an R.
However, note that a reduction at that point would never
actually lead to a successful parse. In practice, L should only
be reduced to an R when the lookahead is EOF ($).
An easy way to understand this is by considering that L
represents l-values while R represents r-values.

22
Conflicts in SLR parsing
The conflict occurred because we made a decision about when
to reduce based on what token may follow a non-terminal at
any time.
However, the fact that a token t may follow a non-terminal N in
some derivation does not necessarily imply that t will follow N in
some other derivation.
SLR parsing does not make a distinction.
23
Conflicts in SLR parsing
SLR parsing is weak.
Solution : instead of using general FOLLOW
information, try to keep track of exactly what tokens
many follow a non-terminal in each possible
derivation and perform reductions based on that
knowledge.
Save this information in the states.
This gives rise to LR(1) items:
items where we also save the possible lookaheads.
24
Canonical LR(1) parsing
In the beginning, all we know is that we have not
read any input (S'S), we hope to parse an S and
after that we should expect to see a $ as lookahead.
We write this as: S'S, $
Now, consider a general item A, x. It means
that we have parsed an , we hope to parse and
after those we should expect an x. Recall that if there
is a production , we should add to the
state. What kind of lookahead should we expect to
see after we have parsed ?
We should expect to see whatever starts a . If is empty
or can vanish, then we should expect to see an x after we
have parsed (and reduced it to B)
25
Canonical LR(1) parsing
The closure function for LR(1) items is then defined
as follows:

For each item A, x in state I,
each production in the grammar,
and each terminal b in FIRST(x),
add , b to I

If a state contains core item with multiple
possible lookaheads b
1
, b
2
,..., we write , b
1
/b
2

as shorthand for , b
1
and , b
2
26
id
Canonical LR(1) parsing
L id , =/$
S L =R, $
R L , $
S' S , $
I
0

I
1

I
2

I
3

S R, =/$
I
4

I
5

S
L
*
id R
I
6

=
R
SL=R, $
R L, =/$
L
L
I
7

id
*
*
L *R , =/$
R
I
8

I
9

S' S, $
S L=R, $
S R, $
L *R, =/$
L id, =/$
R L, $
L *R, =/$
R L, =/$
L id, =/$
L *R, =/$
Lid, $ I
3
'
R L, $ I
7
'
S L= R, $
R L, $
L *R, $
L id, $
I
5
'
*
L *R , $
I
8
'
L *R, $
R L, $
L id, $
L *R, $
L
R
id
I
3
'
27
Canonical LR(1) parsing
The table is created in the same way as SLR, except
we now use the possible lookahead tokens saved in
each state, instead of the FOLLOW sets.
Note that the conflict that had appeared in the SLR
parser is now gone.
However, the LR(1) parser has many more states.
This is not very practical.
It may be possible to merge states!
28
LALR(1) parsing
This is the result of an effort to reduce the number of
states in an LR(1) parser.
We notice that some states in our LR(1) automaton have
the same core items and differ only in the possible
lookahead information. Furthermore, their transitions are
similar.
States I
3
and I
3
', I
5
and I
5
', I
7
and I
7
', I
8
and I
8
'
We shrink our parser by merging such states.
SLR : 10 states, LR(1): 14 states, LALR(1) : 10 states
29
id
LALR(1) parsing
L id , =/$
S L =R, $
R L , $
S' S , $
I
0

I
1

I
2

I
3

S R, =/$
I
4

I
5

S
L
*
id R
I
6

=
R
SL=R, $
R L, =/$
L
L
I
7

id
*
*
L *R , =/$
R
I
8

I
9

S' S, $
S L=R, $
S R, $
L *R, =/$
L id, =/$
R L, $
L *R, =/$
R L, =/$
L id, =/$
L *R, =/$
S L= R, $
R L, $
L *R, $
L id, $
I
3

30
Conflicts in LALR(1) parsing
Note that the conflict that had vanished when we
created the LR(1) parser has not reappeared.
Can LALR(1) parsers introduce conflicts that did not
exist in the LR(1) parser?
Unfortunately YES.
BUT, only reduce/reduce conflicts.
31
Conflicts in LALR(1) parsing
LALR(1) parsers cannot introduce shift/reduce conflicts.
Such conflicts are caused when a lookahead is the same as a token
on which we can shift. They depend on the core of the item. But we
only merge states that had the same core to begin with. The only
way for an LALR(1) parser to have a shift/reduce conflict is if one
existed already in the LR(1) parser.
LALR(1) parsers can introduce reduce/reduce conflicts.
Here's a situation when this might happen:
A B , x
A C , y
A B , y
A C , x
merges with
to give:
A B , x/y
A C , x/y

You might also like