You are on page 1of 45

BOTTOM-UP PARSING

Bottom-Up Parsing
o Bottom-Up Parser : Constructs a parse tree for an input string
beginning at the leaves(the bottom) and working up towards the
root(the top)
o We can think of this process as one of “reducing” a string w to the start
symbol of a grammar
o Bottom-up parsing is also known as shift-reduce parsing because its
two main actions are shift and reduce.
❑ At each shift action, the current symbol in the input string is pushed to
a stack.
❑At each reduction step, the symbols at the top of the stack (this symbol
sequence is the right side of a production) will replaced by the non-
terminal at the left side of that production.
Shift-Reduce Parsing
o A shift-reduce parser tries to reduce the given input string into the
starting symbol.
a string ➔ the starting symbol
reduced to
o At each reduction step, a substring of the input matching to the right
side of a production rule is replaced by the non-terminal at the left side
of that production rule.
o If the substring is chosen correctly, the right most derivation of that
string is created in the reverse order.

Rightmost Derivation: S
rm 
*

Shift-Reduce Parser finds: 


rm
... 
rm
S
Shift–Reduce Parsing-Example
o Consider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde  reduction
B d aABe
S
We can scan abbcde looking for a substring that matches the right side of some
production.The substrings b and d qualify.Let us choose left most b and replace it by
A,the left side of the production A→b;we thus obtain the string aAbcde. Now the
substrings Abc,b and d match the right side of some production.Although b is the
leftmost substring that matches the right side of the some production,we choose to
replace the substring Abc by A,the left side of the production A→Abc. We obtain aAde.
Then replacing d by B, and then replacing the entire string by S. Thus, by a sequence
of four reductions we are able to reduce abbcde to S
Shift–Reduce Parsing-Example

o These reductions in fact trace out the following right-most derivation in


reverse

S
rm aABe 
rm
aAde 
rm
 abbcde
aAbcde rm

Right Sentential Forms

o How do we know which substring to be replaced at each reduction step?


Handle
o Informally, a “handle” of a string is a substring that matches the right side
of the production, and whose reduction to nonterminal on the left side of
the production represents one step along the reverse of a rightmost
derivation

o Formally , a “handle” of a right sentential form γ ( ) is a production


rule A →  and a position of  where the string  may be found and replaced
by A to produce the previous right-sentential form in a rightmost
derivation of .
*
S
rm A 
rm 

then A→β in the position following α is a handle of αβω

o The string  to the right of the handle contains only terminal symbols.
Handle Pruning

o The process of finding the handle and replacing that handle by its LHS
variable is called Handle pruning.
o A rightmost derivation in reverse can be obtained by “handle pruning”. That
is, we start with a string of terminals w that we wish to parse. If ω is a
sentence of grammar at hand, then ω = γ,where γn is the nth right-
sentential form of some as yet unknown rightmost derivation

rm rm rm rm rm

S = 0  1  2  ...  n-1  n= 

Input string
Handle Pruning

S = 0 
rm
1 
rm
2 
rm
... 
rm
n-1 
rm
n= 

o Start from n, find a handle An→n in n,


and replace  n in by An to get n-1.
o Then find a handle An-1→n-1 in n-1,
and replace  n-1 in by An-1 to get n-2.
o Repeat this, until we reach S.
A Shift-Reduce Parser
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E  E+T  E+T*F  E+T*id  E+F*id
F → (E) | id  E+id*id  T+id*id  F+id*id  id+id*id

Right-Most Sentential form HANDLE Reducing Production


id+id*id id F→id
F+id*id F T→F
T+id*id T E→T
E+id*id id F→id
E+F*id F T→F
E+T*id Id F→id
E+T*F T*F T→T*F
E+T E+T E→E+T
E
A Stack Implementation of a Shift-Reduce Parser

o Shift-reduce parsing is a form of bottom-up parsing in which a stack holds


grammar symbols and an input buffer holds the rest of the string to be parsed.
o There are four possible actions of a shift-parser action:
1.Shift : The next input symbol is shifted onto the top of the stack.
2.Reduce: Replace the handle on the top of the stack by the non-terminal.
3.Accept: Successful completion of parsing.
4.Error: Parser discovers a syntax error, and calls an error recovery routine.

o Initial stack just contains only the end-marker $.


o The end of the input string is marked by the end-marker $.
A Stack Implementation of A Shift-Reduce Parser
E → E+T | T
T → T*F | F
F → (E) | id
Stack Input Action
Parse Tree
$ id+id*id$ shift
$id +id*id$ Reduce by F→id
E8
$F +id*id$ Reduce by T→F
$T +id*id$ Reduce by E→T
$E +id*id$ Shift +
E3 T7
$E+ Id*id$ Shift
$E+id *id$ Reduce by F→id
$E+F *id$ Reduce by T→F *
$E+T *id$ Shift T2 T5 F6
$E+T* id$ Shift
$E+T*id $ Reduce by F→id
$E+T*F $ Reduce by T→T*F F1 F4 id
$E+T $ Reduce by E →E+T
$E $ Accept
id id
Bottom-Up Parsing for id * id
Handle Pruning for id * id
Bottom-up parsing during a left-to-right scan of the input constructs a
rightmost derivation in reverse.
Shift-Reduce Parsing for id * id

E → E+T | T
T → T*F | F
F → (E) | id

More Example in notes


Conflicts During Shift-Reduce
Parsing
There are two types of conflicts

1. Shift-Reduce conflict : The parser cannot


decide weather to shift or reduce.

2. Reduce reduce conflict: The parser cannot


decide which of the several reductions to make.
Conflicts During Shift-Reduce Parsing
There are context-free grammars for which shift-reduce parsing cannot be used. Every
shift-reduce parser for such a grammar can reach a configuration in which the parser,
knowing the entire stack contents and the next input symbol, cannot decide whether to
shift or to reduce or cannot decide which of several reductions to make
Reduce reduce conflict: when we know we have a handle, but the stack
contents and the next input symbol are insufficient to determine which
production should be used in a reduction.
Suppose we have a lexical analyzer that returns the token name id for all
names, regardless of their type like procedure or arrays etc.
A statement beginning with p(i, j) would appear as the token stream id(id, id)
to the parser.
Types of LR Parsers
1.LR(0) Parser
2.Simple LR-Parser (SLR)
3.Canonical LR Parser (CLR)
4.LALR Parser.
Model of an LR parser
It consist of
1. Input
2. Output
3. Stack
4. Driver program
5. Parsing table which has 2 parts – ACTION and GOTO
LR-parsing algorithm.
INPUT: An input string w and an LR-parsing table with functions ACTION and
GOT0 for a grammar G.
OUTPUT: If w is in L(G), the reduction steps of a bottom-up parse for w;
otherwise, an error indication
LR (0) Parsing: Various steps involved in the LR (0) Parsing:
1. Write the Context free Grammar for the given input string

2. Check for the Ambiguity

3. Add Augment production

4. Create Canonical collection of LR ( 0 ) items

5. Draw DFA

6. Construct the LR ( 0 ) Parsing table

7. Based on the information from the Table, with help of Stack and

Parsing algorithm generate the output


How does a shift-reduce parser know when to
shift and when to reduce?
• An LR parser makes shift-reduce decisions by maintaining states to
keep track of where we are in a parse.
• States represent sets of "items." An LR(0) item (item for short) of a
grammar G is a production of G with a dot at some position of the
body. Thus, production A ➔ XYZ yields the four items.

• An item indicates how much of a production we have seen at a given


point in the parsing process.
• Collection of sets of LR(0) items is called the canonical LR(0) collection.
• It provides the basis for constructing a deterministic finite automaton that
is used ' to make parsing decisions. Such an automaton is called an LR(0)
automaton. In particular, each state of the LR(0) automaton represents a
set of items in the canonical LR(0) collection.

To construct the canonical LR(0) collection for a grammar, we define


1. An augmented grammar
2. CLOSURE of item set
3. GOTO operation
1. An augmented grammar

If G is a grammar with start symbol S, then G', the


augmented grammar for G, is G with a new start
symbol S1 and production S1 ➔ S. The purpose of
this new starting production is to indicate to the
parser when it should stop parsing and announce
acceptance of the input. That is, acceptance occurs
when and only when the parser is about to reduce
by S1➔ S.
2.Closure of Item Sets
Example:
Algorithm to construct C, the canonical collection of sets of LR(0)
items for an augmented grammar G1
Fig 4.31
Construction of LR (0) parsing Table:

Once we have Created the canonical collection of LR (0) items, we need to follow the steps
mentioned below:
If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then, we
should write the shift entry in the action part as shown below:
If there is a transaction from one state (Ii ) to anot her state (Ij ) on a Non terminal value then,
we should write the subscript value of Ii in the GO TO part as shown below:
If there is one state (Ii), where there is one production which has no transitions.
Then, the production is said to be a reduced production.
These productions should have reduced entry in the Action part along with their
production numbers.
If the Augment production is reducing then, write accept in the Action part.
Canonical-LR
The CLR parser stands for canonical LR parser. It is a more powerful LR
parser. It makes use of lookahead symbols. This method uses a large set of
items called LR(1) items. The main difference between LR(0) and LR(1)
items is that, in LR(1) items, it is possible to carry more information in a
state, which will rule out useless reduction states. This extra information is
incorporated into the state by the lookahead symbol.
Constructing LR(1) Sets of Items
LALR Parser

LALR Parser is lookahead LR parser. It is a powerful parser which can


handle large classes of grammar. The size of CLR parsing table is
quite large as compared to other parsing table. LALR reduces the size
of this table. LALR works similar to CLR. The only difference is , it
combines the similar states of CLR parsing table into one single
state.

You might also like