You are on page 1of 62

Bottom up parsing

Bottom-Up Parsing

 Bottom-up parsing is more general than top- down.


 A bottom-up parser builds a derivation by working
from the input sentence back toward the start symbol S

 Preferred method in practice


 Also called LR parsing
 L means that tokens are read left to right

R means that it constructs a rightmost derivation


Bottom-up Parsing
• Two types of bottom-up parsing:
• Operator-Precedence parsing
• LR parsing
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (Lookahead LR
parser)
Bottom-Up Parsing

LR parsing reduces a string to the start


symbol by inverting productions:
Str input string of terminals
repeat
Identify βin str such that A →β is a
production
Replace β by A in str
untilstr = S (the start symbol) OR all
possibilities are exhausted
Bottom-Up Parsing
 int + (int) + (int) E → E + ( E ) | int

 E + (int) + (int)

 E + (E) + (int)

 E + (int)

 E + (E)

 E

A rightmost derivation
in reverse
Reductions
Bottom-up parsing as the process of
"reducing" a string w to the start symbol
of the grammar.
At each reduction step, a substring that
matches the body of a production is
replaced by the non-terminal at the head
of that production.
Handle

Handle of a string: Substring that matches the


RHS of some production AND whose reduction
to the non-terminal on the LHS is a step along
the reverse of some rightmost derivation.
Handles
 Formally:
Handle of a right sentential form  is <A  ,
location of  in >
i.e. A   is a handle of  at the location
immediately after the end of , if:S => A => 
 A certain sentential form may have many different
handles.
 Right sentential forms of a non-ambiguous grammar
have one unique handle
Example
S  aABe
A  Abc | b
Bd

S  aABe  aAde  aAbcde  abbcde


S  aABe is a handle of aABe in location 1.
B  d is a handle of aAde in location 3.
A  Abc is a handle of aAbcde in location 2.
A  b is a handle of abbcde in location 2.
Handle Pruning
 A rightmost derivation in reverse can be obtained by “handle-pruning.”

abbcde
Find the handle = b at loc. 2
aAbcde
b at loc. 3 is not a handle:
aAAcde
... blocked.
Handle-pruning
 The process of discovering a handle & reducing it to the appropriate left-hand side is
called handle pruning.
 Handle pruning forms the basis for a bottom-up parsing method.
To construct a rightmost derivation

S = 0  1  2  …  n-1  n = w
Apply the following simple algorithm

for i  n to 1 by -1
Find the handle Ai i in i
Replace i with Ai to generate i-1
1. S -> 0 S1|01 indicate the handle in each of the following right-sentential forms:

1. 000111
2. 00S11
2. For the grammar S -> S S + | S S * | a indicate the handle in each of the following
right-sentential forms:

1. SSS + a * +
2. SS + a * a+
3. aaa * a + +.
3. Give bottom-up parses for the following input strings

1. The input 000111 according to the grammar of


Exercise 1
2. The input aaa * a + + according to the grammar of 2
Shift Reduce Parsing with a
Stack
 Two problems:

 locate a handle

 decide which production to use (if there are


more than two candidate productions).
Shift-reduce Parsing
A shift-reduce parser is a stack automaton with four actions
 Shift — next word is shifted onto the stack
 Reduce — right end of handle is at top of stack
Locate left end of handle within the stack
Pop handle off stack & push appropriate lhs
 Accept — stop parsing & report success
 Error — call an error reporting/recovery routine

Accept & Error are simple


Shift is just a push and a call to the scanner
Reduce takes |rhs| pops & 1 push
x-2*y
Stack Input Handle Action
$ id - num * id None shift
$ id - num * id 0 Goal  Expr
1 Expr  Expr + Term
2 | Expr - Term
3 | Term
4 Term  Term * Factor
5 | Term / Factor
6 | Factor
7 Factor  number
8 | id
9 | ( Expr )

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 4 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id
6 | Factor
7 Factor  number
8 | id
9 | ( Expr )

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 4 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id
6 | Factor
7 Factor  number
8 | id
9 | ( Expr )

Expr is not a handle at this point because it does not


occur at this point in the derivation.

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 3 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id none shift
6 | Factor
$ Expr - num * id none shift
7 Factor  number
$ Expr - num * id 8 | id
9 | ( Expr )

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 3 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id none shift
6 | Factor
$ Expr - num * id none shift
7 Factor  number
$ Expr - num * id 7,3 reduce 7 8 | id
$ Expr - Factor * id 6,3 reduce 6 9 | ( Expr )

$ Expr - Term * id

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 3 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id none shift
6 | Factor
$ Expr - num * id none shift
7 Factor  number
$ Expr - num * id 7,3 reduce 7 8 | id
$ Expr - Factor * id 6,3 reduce 6 9 | ( Expr )

$ Expr - Term * id none shift


$ Expr - Term * id none shift
$ Expr - Term * id

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Handle Action
0 Goal  Expr
$ id - num * id none shift
1 Expr  Expr + Term
$ id - num * id 8,1 reduce 8 2 | Expr - Term
$ Factor - num * id 6,1 reduce 6 3 | Term

$ Term - num * id 3,1 reduce 3 4 Term  Term * Factor


5 | Term / Factor
$ Expr - num * id none shift
6 | Factor
$ Expr - num * id none shift
7 Factor  number
$ Expr - num * id 7,3 reduce 7 8 | id
$ Expr - Factor * id 6,3 reduce 6 9 | ( Expr )

$ Expr - Term * id none shift


$ Expr - Term * id none shift
$ Expr - Term * id 8,5 reduce 8 5 shifts +
$ Expr - Term * Factor 4,5 reduce 4
$ Expr - Term 2,3 reduce 2
9 reduces +
$ Expr 0,1 reduce 0 1 accept
$ Goal none accept

1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
Back to x - 2 * y
Stack Input Action
$ id - num * id shift
Goal
$ id - num * id reduce 8
$ Factor - num * id reduce 6
$ Term - num * id reduce 3 Expr
$ Expr - num * id shift
Expr – Term
$ Expr - num * id shift
$ Expr - num * id reduce 7
$ Expr - Factor * id reduce 6 Term Term * Fact.
$ Expr - Term * id shift
$ Expr - Term * id shift Fact. Fact. <id,y>
$ Expr - Term * id reduce 8
$ Expr - Term * Factor reduce 4 <id,x> <num,2>
$ Expr - Term reduce 2
$ Expr reduce 0 Corresponding Parse Tree
$ Goal accept
Conflicts During Shift-Reduce Parsing

Conflicts
“shift/reduce” or “reduce/reduce”
Example:
stmt  if expr then stmt
We can’t tell
| if expr then stmt else stmt
whether it is a
handle | other (any other statement)

Stack Input
if … then stmt else … Shift/ Reduce Conflict
LR Parsing

 Bottom-up parser based on a concept called


LR(k) parsing
 "L" is for left-to-right scanning of the input.
 "R" for constructing a rightmost derivation in
reverse,
 “k” for the number of input symbols of
lookahead that are used in making parsing
decisions.
Why LR Parsers?
 For a grammar to be LR it is sufficient that a
left-to-right shift-reduce parser be able to
recognize handles of right-sentential forms
when they appear on top of the stack.
Why LR Parsers?
 LR parsers can be constructed to recognize all
programming language constructs for which
context-free grammars can be written.
 The LR-parsing method is the most general non-
back-tracking shift-reduce parsing method.
 An LR parser can detect a syntactic errors.
 The class of grammars that can be parsed using LR
methods is a proper superset of the class of
grammars that can be parsed with predictive or LL
methods.
Components of LR Parser

Input buffer
A + B $

X
Parsing Program
Y
output
stack Z
$
Parse Table
Action Goto
Techniques for Creating Parse Table
 SLR: Construct parsing table for small set of grammars called
SLR(1).
 Easy to develop.
 CLR(CANONICAL LR) : Most powerful.
Generates a large parse table.
More difficult develop.
Works for all types of CFG
May detect position of error.
 LALR(LOOK AHEAD LR) :Widely used method.
 Optimizes the size of parse table, by combining some states.
 Information may get lost.
SLR Parsers
1. Formation of augmented grammar G’ for
the given grammar G
2. Construction of LR(0) collection of
items.
To find LR(0) collection of items Closure(I)
and Goto(I,X) have to be computed.
3. Finding first and follow of non-terminals
4. Construction of parse table.
Formation of Augmented
Grammar
 The
augmented grammar G’, is G with a new start
symbol S’ and an additional production S’ -> S
E->E+T|T
T->T*F|F
 The augmented grammar G’ is given by
E’->E
E->E+T|T
T->T*F|F
Items and the LR(O)
Automaton
 How does a shift-reduce parser know when
to shift and when to reduce?
 How does the parser know that symbol on the
top of the stack is not a handle?
 An LR parser makes shift-reduce decisions
by maintaining states to keep track of all the
operations.
 States represent sets of "items."
Items and the LR(O)
Automaton
 An LR(O) item of a grammar G is a production of G, with a
dot at some position in the body.
 Production A -> XYZ yields the four items
 A -> ·XYZ
 A -> X·YZ
 A -> XY·Z
 A -> XYZ·
 A ->ε generates only one item, A ->.
 An item indicates how much of a production we have seen
at a given point in the parsing process.
Items and the LR(O)
Automaton
 The item A -> ·XYZ indicates that we hope to see a string derivable from XYZ next
on the input.

 Item A -> X·YZ indicates that we have just seen on the input a string derivable from
X and that we hope next to see a string derivable from YZ.

 Item A -> XY·Z indicates that we have just seen on the input a string derivable from
XY and that we hope next to see a string derivable from Z.

 Item A -> XYZ· indicates that we have seen the body XY Z and that it may be time
to reduce XYZ to A
Items and the LR(O)
Automaton
 Collection of sets of LR(0) items, called the
canonical LR(0) collection.
 Provides the basis for constructing a deterministic
finite automaton that is used to make parsing
decisions.
 Automaton is called an LR(0) automaton.
 Each state of the LR(0) automaton represents a
set of items in the canonical LR(0) collection.
Construction of LR(0) Items

 Itemsare viewed as states in NFA.


 Grouped to form same states.
 Process of grouping together is called subset
Construction Algorithm.
 Closure and goto operations have to be
computed.
Closure of Item Sets
 If I is a set of items for a grammar G, then CLOSURE(I)
is the set of items constructed from I by the two rules:

 Initially, add every item in I to CLOSURE(I).

 If A -> α·Bγ is in CLOSURE(I) and B -> γ is a


production, then add the item B -> γ to CLOSURE(I),
if it is not already there. Apply this rule until no more
new items can be added to CLOSURE (I).
Computation of CLOSURE
I0
E’  .E
E’  E E  .E + T
EE+T | T E  .T
TT*F | F T  .T * F
F  ( E ) | id T .F
F  .( E )
F  .id
Computation of CLOSURE
SetOfltems CLOSURE (I) {
J = I;
repeat
for ( each item A -> a·Bγ in J )
for ( each production B ->γ of G )
if (B ->.γ is not in J )
add B ->.γ to J;
until no more items are added to J on one round;
return J;
}
 Kernel items : The initial item, S' ->·S, and all items whose dots are not at
the left end.
 Non-kernel items : All items with their dots at the left end, except for S' ->
·S.
The Function GOTO
 GOTO (I, X) is defined to be the closure of the set of all items [A -> αX.β] such that
[A -> α.Xβ] is in I.
 If I is the set of two items { [E' -> E·] , [E -> E· + T] } , then
 GOTO(I, +) contains the items
 E -> E + ·T
 T -> ·T * F
 T -> ·F
 F -> · (E)
 F -> ·id
The Function GOTO
1 2 +3 S  E + . S Grammar
S’  . S $ E S  E . +S SE+S|E
S.E+S
S  .E + S SE. E S.E E  num
S.E 4 num E  . num
E  .num
num E  num . 5 S
S 7 SE+S.
S’  S . $ $ S’  S $ .
num + $ E S
1 s4 g2 g6
2 s3 SE
3 s4 g2 g5
4 Enum Enum
5 SE+S
6 s7
7 accept
I1
S R
S' S S' S  I6 S  L=  R S  L=R 
I0 RL
S   L=R
SR L   *R
L   id id I9
L   *R L
S  L =R = I3
L   id
RL R L
I2
L
* *
L  * R
RL
S' S
id R I5 L
L   id R L  I7 S   L=R
L*R SR
I3 L  id 
id R
L *R  I8
L   *R
L   id
RL
I4 S  R  *
state action goto
id = * $ S L R
0 s3 s5 1 2 4
1 accept
2 s6/r(RL)
3 r(Lid) r(Lid)
4 r(SR)
5 s3 s5 7 8
6 s3 s5 7 9
7 r(RL) r(RL)
8 r(L*R) r(L*R)
9 r(SL=R)
Structure of the LR Parsing
Table
1. The ACTION function takes as arguments a state i and a terminal
a (or $, the input endmarker). The value of ACTION[i, a] can have
one of four forms:
a) Shift j , where j is a state. The action taken by the parser shifts
input a to the stack, but uses state j to represent a .
b) Reduce A ->β. The action of the parser reduces β on the top of
the stack to head A.
c) Accept. The parser accepts the input and finishes parsing;
d) Error. The parser discovers an error in its input and takes some
corrective action.
2. Extend the GOTO function, defined on sets of items, to states: if
GOTo [Ii , A] = Ij , then GOTO also maps a state i and a
nonterminal A to state j .
Construction of SLR Parse
Table
1. Construct C={I0,I1…….In} the collection sets of LR(0)
items for G’.
2. Initial state of the parser is constructed from the set of items
for [S’->S]
3. State I is constructed from Ii. The parsing actions are
determined as follows.
1. If [A->α.aβ] is in Ii and GOTO(Ii,a]=Ij, then ACTION[i,a]
is set to ‘shift j’ here ‘a’ must be a terminal.
2. If[A->α.] is in Ii, then ACTION[i,a] is set to reduce A->a
for all ‘a’ Follow(A).
3. If S’->S is in Ii, than action[i, $] is set to ‘accept’.
Construction of SLR parsing
table
4. GOTO transitions are constructed for all non-terminals. If GOTO(Ii,A) =Ij, then
goto(i,A)of the parse table is set to j.
5. All other entries are error entries.
LR-Parser Configurations

 Helps to have complete state o its stack and the remaining


input. A configuration of an LR parser is a pair:(s0s1
………… sm, aiai+1…………an$).
 where the first component is the stack contents and the
second component is the remaining input.
Behavior of the LR Parser
LR-parsing algorithm.
 INPUT: An input string w and an LR-parsing table with functions ACTION
and GOTO for a grammar G.
 OUTPUT: If w is in L ( G), the reduction steps of a bottom-up parse for W ;
otherwise, an error indication.
 METHOD: Initially, the parser has So on its stack, where So is the initial
state, and w$ in the input buffer.
LR-Parser Configurations

You might also like