You are on page 1of 35

TOPIC 5 – BOTTOM UP

PARSING
1

SLIDES COMPILED BY: FADZLIN AHMADON


FACULTY OF COMPUTER & MATHEMATICAL
SCIENCES
UITM (MELAKA) JASIN CAMPUS

TEXT BOOK: COMPILER DESIGN: THEORY,


TOOLS & EXAMPLES (JAVA EDITION) BY
BERGMANN, SETH D. (CHAPTER 5)

Fadzlin Ahmadon – UiTM JASIN


Introduction
2

Top Down Parsing: Parse from top to


bottom
Bottom Up Parsing: Parse from bottom
to up
 From the bottom of derivation tree to up
 Apply grammar rules in reverse

Fadzlin Ahmadon – UiTM JASIN


Shift Reduce Parsing
3

Bottom up parsing involves TWO fundamental


operations:
 SHIFT Operation : move an input symbol to the stack
 REDUCE Operation: replace symbols on top of the stack with a
nonterminal
This two operations are doing derivation step in
reverse
Bottom up parsers are also called shift reduce
parsers because they use the two operations

Fadzlin Ahmadon – UiTM JASIN


Shift Reduce Parsing
4

Shift Reduce parsing steps:


1. Begin with an empty stack
2. Input symbols are moved to the stack (SHIFT)
3. Symbols on the stack replaced by non terminals
according to grammar rules (REDUCE)
4. When all input symbols have been read and the
symbol on top of the stack is the starting non
terminal : ACCEPT

Fadzlin Ahmadon – UiTM JASIN


Shift Reduce Parsing
5

Sequence of Stack Frames Parsing aabb


▽ aabb↵ shift
G:
▽a abb↵ shift 1. A → a B
▽aa bb↵ 2. B → A b
shift
▽aab b↵ reduce using rule 3
3. B → b
▽aaB b↵ reduce using rule 1
▽aA b↵ shift
▽aAb ↵ reduce using rule 2
▽aB ↵ reduce using rule 1
▽A ↵ Accept

String of symbols being reduced: called a handler


Fadzlin Ahmadon – UiTM JASIN
▽ caabaab↵
Shift Reduce Parsing
shift
▽c aabaab↵ reduce using rule 2
▽S aabaab↵ shift
▽Sa abaab↵ shift
▽Saa baab↵ shift
G19:
▽Saab aab↵ 1. S → S a B
reduce using rule 3
2. S → c
▽SaB aab↵ reduce using rule 1 3. B → a b
▽S aab↵ Shift
▽Sa ab↵ Shift Sequence of Stack
▽Saa b↵ Frames Parsing
Shift
▽Saab ↵ reduce using rule 3
caabaab using
▽SaB ↵ Grammar G19
reduce using rule 1
▽S ↵ Accept

Fadzlin Ahmadon – UiTM JASIN 6


LR Grammar
7

For top down parsing we have been introduced with


the concept of LL(1) grammar.
LL(1) grammar:
 Parser finds a left-most derivation when scanning the input
from left to right
 While looking ahead on only ONE input symbol at a time
 Grammar that can be parsed using top-down parser
We now will learn the concept of LR grammar:
 Parser reads input from the left while finding a right-most
derivation
 Grammar that can be parsed using shift reduce parser

Fadzlin Ahmadon – UiTM JASIN


Shift/Reduce Conflict
8

A Shift/Reduce conflict happens when the parser


does not know whether to:
Shift an input symbol OR

 Reduce the handle on the stack

When this happens, the grammar is not LR and


we must either rewrite the grammar or use a
different parsing algorithm

Fadzlin Ahmadon – UiTM JASIN


Shift/Reduce Conflict
9
An example of a Shift/Reduce Conflict for string:
aaab
▽ aaab↵ shift G20:
▽a aab↵ reduce using rule 2 1. S → S a B
▽S aab↵ shift 2. S → a
▽Sa ab↵ shift/reduce conflict
3. B → a b
reduce using rule 2
(incorrect)
▽SS ab↵ shift
▽SSa b↵ shift
▽SSab ↵ reduce using rule 3
▽SSB ↵ Syntax error (incorrect)

Fadzlin Ahmadon – UiTM JASIN


Reduce/Reduce Conflict
10

A Reduce/Reduce conflict happens when it is clear


that a reduce operation should be performed BUT:
There is more than one grammar rule whose right hand

side matches the top of the stack AND
 It is not clear which rule should be used

Grammar that causes a reduce/reduce conflict is an


ambiguous grammar, and not LR.

Fadzlin Ahmadon – UiTM JASIN


Reduce/Reduce Conflict
11
Reduce/reduce conflict for string: aa using G21
G21:
1. S → S A
2. S → a
▽ aa↵ shift 3. A → a
▽a a↵ reduce/reduce conflict (rules 2 and 3)

reduce using rule 3 (incorrect)


▽A a↵ shift
▽Aa ↵ reduce/reduce conflict (rules 2 and 3)

reduce using rule 2 (rule 3 will also yield a syntax error)


▽AS ↵ Syntax error

Fadzlin Ahmadon – UiTM JASIN


How to Resolve the Conflicts?
12

1. Make an assumption:
 Example: If all shift/reduce conflicts could be resolved by
shifting rather than reducing, no need to rewrite the grammar
2. Rewrite the grammar
3. Look ahead at additional input characters
 Example: Look ahead at k input symbols. (Grammar would be
called LR(k))

For implementing programming


languages using bottom up
parser, we usually use only LR(1)
Fadzlin Ahmadon – UiTM JASIN
LR Parsing with Tables
13

One way to implement shift reduce parsing is by


using tables that determine whether to shift or
reduce and which grammar rule to reduce.
This technique uses two tables:

GoTo
ACTION TABLE: Determines whether a shift or reduce is to be
Action

invoked
 GOTO TABLE: Indicates which stack symbol is to be pushed

Table
on the stack after a reduction
Table
Fadzlin Ahmadon – UiTM JASIN
LR Parsing with Tables
14

Shift •Implemented by a push operation


•Followed by an advance input

Action (pointer) operation

Reduce •Implemented by a replace operation


•Stack symbols on the right side of the grammar
rule are replaced by stack symbol from goto table

Action
•Input pointer is retained

Fadzlin Ahmadon – UiTM JASIN


LR Parsing with Tables
15

LR Parser operation:


1. Find the action corresponding to the current input and
the top stack symbol
2. If that action is a shift action:
a) Push the input symbol onto the stack
b) Advance the input pointer
3. If that action is a reduce action:
1. Find the grammar rule specified by the reduce action
2. Pop all the symbols on the right side of the rule
3. On the goto table, check the left side of the rule on COLUMN and
the top stack symbol on ROW – Push the symbol on the CELL
onto the stack
4. Retain input pointer
Fadzlin Ahmadon – UiTM JASIN
LR Parsing with Tables
16

LR Parser operation (continued):


4. If that action is blank, a syntax error has been detected
5. If that action is ACCEPT, terminate
6. Repeat from step 1

Fadzlin Ahmadon – UiTM JASIN


LR Parsing with Tables
17

For example, suppose we have the following stack and


input configuration:
Stack Input
▽S ab↵

The action shift in action table will result in the


following configuration:
Stack Input
▽Sa b↵

The a has been shifted from the input to the stack.


Fadzlin Ahmadon – UiTM JASIN
LR Parsing with Tables
18

Suppose, then, that in the grammar, rule 7 is:


7. B → Sa
Select the row of the goto table labeled ▽, and the
column labeled B.

If the entry in this cell is push X, then the action


reduce 7 would result in the following
configuration:
Stack Input
▽X b↵
Fadzlin Ahmadon – UiTM JASIN
LR Parsing with Tables
19

 LR Parsing tables for grammar G5 (arithmetic expressions


involving only addition and multiplication):

G 5
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor
4. Term → Factor
5. Factor →( Expr ) ▽
6. Factor → var Initial Stack

Fadzlin Ahmadon – UiTM JASIN


LR Parsing with Tables
20

Action Table for grammar G5


A c t i o n T a b l e
+ * ( ) var ↵
▽ shift ( shift var
Expr1 shift + Accept
Term1 reduce 1 shift * reduce 1 reduce 1
Factor3 reduce 3 reduce 3 reduce 3 reduce 3
( shift ( shift var
Expr5 shift + shift )
) reduce 5 reduce 5 reduce 5 reduce 5
+ shift ( shift var
Term2 reduce 2 shift * reduce 2 reduce 2
* shift ( shift var
Factor4 reduce 4 reduce 4 reduce 4 reduce 4
var reduce 6 reduce 6 reduce 6 reduce 6

Fadzlin Ahmadon – UiTM JASIN


LR Parsing with Tables
21

Goto table for grammar G5


G o t o T a b l e
Expr Term Factor
▽ push Expr1 push Term2 push Factor4
Expr1
Term1
Factor3
( push Expr5 push Term2 push Factor4
Expr5
)
+ push Term1 push Factor4
Term2
* push Factor3
Factor4
var

Fadzlin Ahmadon – UiTM JASIN


LR Parsing with Tables
22
 Show the sequence of stack, input, action, and goto configurations for the input
var∗var using the parsing tables in previous slides
G5
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor
4. Term → Factor
5. Factor →( Expr )
6. Factor → var
▽ var*var↵ shift var
▽var *var↵ reduce 6 push Factor4
▽Factor4 *var↵ reduce 4 push Term2
▽Term2 *var↵ Shift*
▽Term2* var↵ shift var
▽Term2*var ↵ reduce 6 push Factor3
▽Term2*Factor3 ↵ reduce 3 push Term2
▽Term2 ↵ reduce 2 push Expr1
▽Expr1 ↵ Accept

Fadzlin Ahmadon – UiTM JASIN


G5 LR Parsing with Tables
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor Show the sequence of
4. Term → Factor configurations when parsing:
5. Factor →( Expr )
6. Factor → var (var+var)∗var
▽ (var+var)*var↵ shift (
▽( var+var)*var↵ shift var
▽(var +var)*var↵ reduce 6 push Factor4
▽(Factor4 +var)*var↵ reduce 4 push Term2
▽(Term2 +var)*var↵ reduce 2 push Expr5
▽(Expr5 +var)*var↵ shift +
▽(Expr5+ var)*var↵ shift var
▽(Expr5+var )*var↵ reduce 6 push Factor4
▽(Expr5+Factor4 )*var↵ reduce 4 push Term1
▽(Expr5+Term1 )*var↵ reduce 1 push Expr5
G5 LR Parsing with Tables
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor Show the sequence of
4. Term → Factor configurations when parsing:
5. Factor →( Expr )
6. Factor → var (var+var)∗var

▽(Expr5 )*var↵
shift )
▽(Expr5) *var↵ reduce 5 push Factor4
▽Factor4 *var↵ reduce 4 push Term2
▽Term2 *var↵ shift *
▽Term2* var↵ shift var
▽Term2*var ↵ reduce 6 push Factor3
▽Term2*Factor3 ↵ reduce 3 push Term2
▽Term2 ↵ reduce 2 push Expr1
▽Expr1 ↵ Accept
SableCC
25

There are software systems that can generate a


parser automatically from specifications
In this example we will again take a look at SableCC,
which was developed at McGill University
We will take a look on how to use SableCC to do a
bottom up parsing

Fadzlin Ahmadon – UiTM JASIN


SableCC - Overview
26

User of SableCC prepares a grammar file as well as


two java classes:
 Translation
 Compiler
Using the grammar file as input SableCC generates
java code to compile source code and produce an
abstract syntax tree as output
Grammar New Compiler
File SableCC
(Java Codes)

New Abstract Syntax


Source Code
Compiler Tree

Fadzlin Ahmadon – UiTM JASIN


SableCC - Overview
27

Generation and Compilation of a Compiler using


SableCC
language.grammar

SableCC

parser lexer node analysis

Translation.java Compiler.java

javac javac

Translation.class Compiler.class

Fadzlin Ahmadon – UiTM JASIN


SableCC Grammar File
28

There are six sections in the grammar file:


1. Package
2. Helpers
3. States
4. Tokens
5. Ignored Tokens
6. Productions
We’ve discussed on the first four sections in Topic 3
Ignored Tokens: specify tokens ignored by parser

Fadzlin Ahmadon – UiTM JASIN


SableCC - Productions
29

Productions: contains grammar rules for the


language being defined
Each definition consists of the name of the
nonterminal being defined, an equal sign, an EBNF
definition and a semicolon to terminate the
production
Example of a production defining a while statement:

stmt = while l_par bool_expr r_par


stmt ;

Fadzlin Ahmadon – UiTM JASIN


SableCC - Productions
30

Productions may use EBNF-like constructs. For


example if x is any grammar symbol, then:

x? // An optional x (0 or 1 occurrences of x)

x* // 0 or more occurrences of x

x+ // 1 or more occurrences of x

Fadzlin Ahmadon – UiTM JASIN


SableCC - Productions
31

Alternative definitions using | are also permitted


Alternatives must be labeled with names enclosed in
braces
arg_list = {single} identifier
| {multiple} identifier ( comma identifier )
+ ;

Fadzlin Ahmadon – UiTM JASIN


SableCC - Productions
32

Labels must be used when two identical names


appear in a grammar rule
Each item label must be enclosed in brackets and
followed by a colon

for_stmt = for l_par [init]: assign_expr [s1]:semi


bool_expr
[s2]:semi [incr]: assign_expr r_par stmt ;

Fadzlin Ahmadon – UiTM JASIN


An Example using SableCC
33

Grammar to translate infix to postfix

Grammar:
1. Expr → Expr + Term
2. Expr → Expr - Term
3. Expr → Term
4. Term → Term ∗ Factor
5. Term → Term / Factor
6. Term → Factor
7. Factor → ( Expr )
8. Factor → number

Fadzlin Ahmadon – UiTM JASIN


An Example using SableCC
34

Tokens & Ignored Tokens


Tokens
number = ['0'..'9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
l_par = '(';
r_par = ')';
blank =
(' ' | 10 | 13 | 9)+;
semi = ';' ;
Ignored Tokens
blank;

Fadzlin Ahmadon – UiTM JASIN


An Example using SableCC
35

Productions Productions
expr =
{term} term |
Alternative {plus} expr plus term |
definition
{minus} expr minus term
using ‘|’ and
have name ;
term =
{factor} factor |
{mult} term mult factor |
{div} term div factor
;
factor =
{number} number |
{paren} l_par expr r_par
;
Fadzlin Ahmadon – UiTM JASIN

You might also like