Professional Documents
Culture Documents
Chapter 4
Syntax Analysis
Top – Down Parsing
Outline
• Role of the parser
• Top-Down parsing:
–Predictive Parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty
• Recursive, and
• Nonrecursive
3
Introduction
• The syntax of the programming language constructs can be
described by context free grammars or BNF (Backnus-Naur Form).
• Grammar offers significant advantage to both language designer
and compiler writers.
• A grammar gives precise, yet easy-to understand, syntactic
specification of a programming language.
(C) 2014, Prepared by Partha Sarathi Chakraborty
Token,
Source tokenval Parser
Lexical Intermediate
Program and rest of
Analyzer representation
(C) 2014, Prepared by Partha Sarathi Chakraborty
Symbol Table
9
The Parser
• The task of the parser is to check syntax
• The syntax-directed translation stage in the
compiler’s front-end checks static semantics and
produces an intermediate representation (IR) of
(C) 2014, Prepared by Partha Sarathi Chakraborty
Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover and
continue
• Example: misspelling identifier, keyword or operator
– Syntax errors: most important for compiler, can almost
(C) 2014, Prepared by Partha Sarathi Chakraborty
always recover
• Example: an arithmetic expression with unbalanced parenthesis
– Static semantic errors: important, can sometimes recover
– Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
• Example for semantic error: an operator applied to an incompatible
operand.
– Logical errors: hard or impossible to detect
• Example: an infinitely recursive call.
11
Error Handling
• The error handler in a parser has simple-to-
state goals:
– It should report the presence of errors clearly and
(C) 2014, Prepared by Partha Sarathi Chakraborty
accurately.
– It should recover from each error quickly enough
to be able to detect subsequent errors.
– It should not significantly slow down the
processing of correct programs.
12
Viable-Prefix Property
• The viable-prefix property of LL/LR parsers
allows early detection of syntax errors
– Goal: detection of an error as soon as possible
(C) 2014, Prepared by Partha Sarathi Chakraborty
Grammars
• Context-free grammar is a 4-tuple
G=(N,T,P,S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
(C) 2014, Prepared by Partha Sarathi Chakraborty
Derivations
• The one-step derivation is defined by
A
where A is a production in the grammar
• In addition, we define
(C) 2014, Prepared by Partha Sarathi Chakraborty
Derivation (Example)
EE+E
EE*E
E(E)
E-E
E id
(C) 2014, Prepared by Partha Sarathi Chakraborty
E - E - id
E rm E + E rm E + id rm id + id
E * E
E + id * id + id
18
ABw or Aw
– Context free if each production is of the form
A
where A N and (NT)*
– Context sensitive if each production is of the form
A
where A N, ,, (NT)*, || > 0
– Unrestricted
19
Chomsky Hierarchy
Examples:
Every finite language is regular
L1 = { anbn | n 1 } is context free
L2 = { anbncn | n 1 } is context sensitive
20
Ambiguity
• A grammar that produces more than one
parse tree for some sentence is said to be
ambiguous.
• Example: id + id * id
(C) 2014, Prepared by Partha Sarathi Chakraborty
Eliminating Ambiguity
• “Dangling-else” grammar
Eliminating Ambiguity
• The general rule is, “Match each else with the closet previous
unmatched then”.
• The idea is that a statement appearing between a then and an
else must be "matched" ; that is, the interior statement must not
end with an unmatched or open then. A matched statement is
either an if-then-else statement containing no open statements
(C) 2014, Prepared by Partha Sarathi Chakraborty
Left Recursion
• Productions of the form
AA|
are left recursive
(C) 2014, Prepared by Partha Sarathi Chakraborty
Immediate Left-Recursion
Elimination
Example
• Consider the grammar
EE+T|T
TT*F|F
F ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty
Another Example
• Consider the grammar, but it is not immediately left recursive.
S Aa | b
A Ac | Sd |
• Using general left recursion algorithm
• Substitute S-productions A Sd to obtain the following
(C) 2014, Prepared by Partha Sarathi Chakraborty
productions
A Ac | Aad | bd |
• Now, Eliminate the immediate left recursion among the A-
productions
S Aa | b
A bdA’ | A’
A’ cA’ | adA’ |
28
Ai Aj
with
Ai 1 | 2 | … | k
where
Aj 1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
29
i = 1: nothing to do
i = 2, j = 1: BCA|Ab
(C) 2014, Prepared by Partha Sarathi Chakraborty
BCA|BCb|ab
(imm) B C A BR | a b BR
BR C b BR |
i = 3, j = 1: CAB|CC|a
CBCB|aB|CC|a
i = 3, j = 2: CBCB|aB|CC|a
C C A BR C B | a b BR C B | a B | C C | a
(imm) C a b BR C B CR | a B CR | a CR
CR A BR C B CR | C CR |
30
Left Factoring
• When a nonterminal has two or more productions whose right-
hand sides start with the same grammar symbols, the grammar
is not LL(1) and cannot be used for predictive parsing
• If A 1 | 2 | are productions
• After Left-Factored,
(C) 2014, Prepared by Partha Sarathi Chakraborty
A A’ |
A’ 1 | 2
Example
• Consider the grammar
S iEtS | iEtSeS | a
Eb
(C) 2014, Prepared by Partha Sarathi Chakraborty
Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
(C) 2014, Prepared by Partha Sarathi Chakraborty
Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing
Grammar: Leftmost derivation:
ET+T E lm T + T
(C) 2014, Prepared by Partha Sarathi Chakraborty
T(E) lm id + T
T-E
lm id + id
T id
E E E E
T T T T T T
+ id + id + id
34
Example
• Consider the grammar
S cAd
A ab | a
• Steps to build parse tree for string “cad”.
(C) 2014, Prepared by Partha Sarathi Chakraborty
• Limitations:
– It is not very efficient as compared to other parsing
techniques as there are chances that it may enter in an
infinite loop for some input.
– It is difficult to parse the string if lookahead symbol is
arbitrarily long.
38
Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty
T FT’
T’ *FT’ |
F ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty
Parsers
Transition Diagrams for Predictive
40
(C) 2014, Prepared by Partha Sarathi Chakraborty
Predictive Parsers
Transition Diagrams for
41
42
Non-Recursive Predictive
Parsing
• Given an LL(1) grammar G=(N,T,P,S)
construct a table M[A,a] for A N, a T
and use a driver program with a stack
(C) 2014, Prepared by Partha Sarathi Chakraborty
input a + b $
stack
Predictive parsing
X program (driver) output
Y
Z Parsing table
$ M
43
FIRST
• To compute FIRST(X) for all grammar symbols X, apply
the following rules until no more terminals or can be
added to any FIRST set.
– If X is terminal, then FIRST(X) is {X}.
– If X is a production, then add to FIRST(X).
(C) 2014, Prepared by Partha Sarathi Chakraborty
Example
Consider the grammar
E TE’
E’ +TE’ |
T FT’
(C) 2014, Prepared by Partha Sarathi Chakraborty
T’ *FT’ |
F ( E ) | id
FOLLOW
• To compute FOLLOW(A) for all nonterminals A, apply
the following rules until nothing can be added to any
FOLLOW set.
(C) 2014, Prepared by Partha Sarathi Chakraborty
Example
Consider the grammar
E TE’
E’ +TE’ |
T FT’
(C) 2014, Prepared by Partha Sarathi Chakraborty
T’ *FT’ |
F ( E ) | id
whenever possible.
• Sets of tokens yield by the FOLLOW function can
also be used as synchronizing tokens during panic-
mode error recovery.
• FIRST and FOLLOW also useful for LR parsing i. e.
for LR(1) items and SLR(1) table.
49
Ch|
(C) 2014, Prepared by Partha Sarathi Chakraborty
Construction of Predictive
Parsing Table
Algorithm: Construction of a predictive parsing table.
Input: Grammar G.
Output: Parsing Table M.
Method:
(C) 2014, Prepared by Partha Sarathi Chakraborty
– If X = a $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
– If X is a nonterminal, the program consults entry M[X, a] of the parsing
table M. This entry will be either an X-production of the grammar or an
error entry. If for example, M[X, a] = {X UVW}, the parser replaces
X on top of the stack by WVU (with U on top).
• As output, we shall assume that the parser just prints the production
used; any other code could be executed here.
• If M[X, a] = error, the parser calls an error recovery routine.
53
Moves made by the Nonrecursive
predictive parser
(C) 2014, Prepared by Partha Sarathi Chakraborty
54
LL(1) Grammar
• LL(1) means
– The first “L”: scanning the input from left to
right.
(C) 2014, Prepared by Partha Sarathi Chakraborty
Non-LL(1) Examples
Grammar Not LL(1) because
SSa|a Left recursive
SaS|a FIRST(a S) FIRST(a)
(C) 2014, Prepared by Partha Sarathi Chakraborty
Parsing: Panic-mode
• Erroneous input: ) id * + id
(C) 2014, Prepared by Partha Sarathi Chakraborty
64
INPUT SYMBOL
(C) 2014, Prepared by Partha Sarathi Chakraborty
id + * ( ) $
E E T E’ E T E’ synch synch
E’ E’ + T E’ E’ E’
T T F T’ synch T F T’ synch Synch
T’ insert * T’ T’ * F T’ T’ T’
F F id synch synch F(E) synch Synch
Nonterminal
INPUT SYMBOL
id + * ( ) $
E E T E’ E T E’ synch synch
E’ E’ + T E’ E’ E’
T T F T’ synch T F T’ synch Synch
T’ T’ F T’ T’ T’ * F T’ T’ T’
F F id synch synch F(E) synch Synch
67
Outline
• Bottom-Up Parsing
– Shift Reduce Parsing
– Operator Precedence Parsing.
– LR parsers: (next Presentation)
(C) 2014, Prepared by Partha Sarathi Chakraborty
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
3
Bottom-Up Parsing
• Start at the leaves and grow toward root.
• We can think of the process as reducing the input
string to the start symbol.
• At each reduction step a particular substring matching
(C) 2014, Prepared by Partha Sarathi Chakraborty
Bottom-Up Parsing
• A general style of bottom-up syntax analysis, known as
shift-reduce parsing.
• Main actions are shift and reduce.
• At each shift action, the current symbol in the input
string is pushed to a stack.
(C) 2014, Prepared by Partha Sarathi Chakraborty
Shift-Reduce Parsing
Handles
A handle is a substring of grammar symbols in a right-
sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
(C) 2014, Prepared by Partha Sarathi Chakraborty
Handles
• A handle of a right sentential form ( ) is a
production rule A and a position of where the string
may be found and replaced by A to produce the previous right-
sentential form in a rightmost derivation of .
S A
i.e. A is a handle of at the location immediately after
(C) 2014, Prepared by Partha Sarathi Chakraborty
the end of ,
• If the grammar is unambiguous, then every right-sentential
form of the grammar has exactly one handle.
• is a string of terminals
9
Handle Pruning
• The process of discovering a handle & reducing it to
the appropriate left-hand side is called handle
pruning. Handle pruning forms the basis for a
bottom-up parsing method.
non-terminal.
– Accept: Successful completion of parsing.
– Error: Parser discovers a syntax error, and calls an error
recovery routine.
11
• Initial State
STACK INPUT
$ W$
• Final State
(C) 2014, Prepared by Partha Sarathi Chakraborty
STACK INPUT
$S $
12
Stack Implementation of
Shift-Reduce Parsing
$BxA z$ reduce
Note: It never had to go into the stack to find the handle. It is this
aspect of handle pruning that makes a stack a particularly
convenient data structure to implementing a shift reduce parser.
16
Conflicts
• Shift-reduce and reduce-reduce conflicts are
caused by
– The limitations of the LR parsing method (even
when the grammar is unambiguous)
– Ambiguity of the grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty
17
Shift-Reduce Parsing:
Shift-Reduce Conflicts
S if E then S
| if E then S else S
| other
Resolve in favor
of shift, so else
matches closest if
18
Shift-Reduce Parsing:
Reduce-Reduce Conflicts
CAB
Aa
Ba
Resolve in favor
of reduce A a,
otherwise we’re stuck!
19
can have:
– at the right side
– two adjacent non-terminals at the right side.
• Example
E → E + E | E * E | ( E ) | −E | id
20
$ ⋖ ⋖ ⋖
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $ E id $ id + id * id $
$ ⋖ + ⋖ id ⋗ * ⋖ id ⋗ $ E id $ E + id * id $
$ ⋖ + ⋖ * ⋖ id .⋗ $ E id $ E + E * id $
$⋖ +⋖ *⋗ $ EE*E $E+ E*E$
$⋖ +⋗ $ EE+E $E+E$
$$ $E$
26
• TRAILING(A) Algorithm
– a is in TRAILING(A) if there is a production of the form A
→ a, where is or a single nonterminal.
– If a is in TRAILING(B), and there is a production of the
form A → B, then a is in TRAILING(A).
28
T * , ( , id * , ) , id
F ( , id ) , id
29
set Xi ≐ Xi+2 ;
if Xi is a terminal and Xi+1 is a nonterminal then
for all a in LEADING(Xi+1) do set Xi ⋖ a ;
if Xi is a nonterminal and Xi+1 is a terminal then
for all a in TRAILNG(Xi) do set a ⋗ Xi+1 ;
end
31
EE+T
(C) 2014, Prepared by Partha Sarathi Chakraborty
+ ⋗ ⋖ ⋖ ⋗ ⋖ ⋗
* ⋗ ⋗ ⋖ ⋗ ⋖ ⋗
Xi Xi+1 Xi+2 ( ⋖ ⋖ ⋖ ≐ ⋖
Xi Xi+1 ) ⋗ ⋗ ⋗ ⋗
id ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖
32
Method: Initially, the stack contains $ and the input buffer the
(C) 2014, Prepared by Partha Sarathi Chakraborty
string w$.
$+ ⋖ id * id$ shift + ⋖ ⋗ ⋖ ⋗
$ + id ⋗ * id$ reduce E id * ⋖ ⋗ ⋗ ⋗
$+ ⋖ * id$ shift
$ ⋖ ⋖ ⋖
$+* ⋖ id$ shift
$ + * id ⋗ $ reduce E id
$+* ⋗ $ reduce E E * E
$+ ⋗ $ reduce E E + E
$ $ Accept
35
id
(a) (b)
(C) 2014, Prepared by Partha Sarathi Chakraborty
$● + id $ $●+ id $
● ● +
id id
(c) (d)
Actions of Operator – Precedence Parsing
36
input string: id + id
● + id ● + ●
id (e) id id (f)
(C) 2014, Prepared by Partha Sarathi Chakraborty
Stack Input
$● $
● + ●
id id (g)
37
↑ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
id ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
( ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ≐ error
) ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖
41
Precedence Functions
• Compilers using operator precedence parsers do not need to
store the table of precedence relations.
• The table can be encoded by two precedence functions f and g
that map terminal symbols to integers.
• For symbols a and b.
f(a) < g(b) whenever a ⋖ b
(C) 2014, Prepared by Partha Sarathi Chakraborty
Precedence Functions
Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
+ − * / ↑ ( ) id $
f 2 2 4 4 4 0 6 6 0
(C) 2014, Prepared by Partha Sarathi Chakraborty
g 1 1 3 3 5 5 0 5 0
Precedence Functions
For example:
* ⋖ id, and f(*) < g(id)
Note: f(id) > g(id) suggests that id ⋗ id;
In fact no precedence relation holds between id and id.
44
Method
(C) 2014, Prepared by Partha Sarathi Chakraborty
+ +
(C) 2014, Prepared by Partha Sarathi Chakraborty
+ * id $
f 2 4 4 0
g 1 3 5 0
47
input.
– If a handle has been found, but there is no
production with this handle as a right side.
48
id e3 e3 ⋗ ⋗
⋖ ⋖ ≐
Handling Shift/Reduce Errors: ( e4
) e3 e3 ⋗ ⋗
Error handling routines $ ⋖ ⋖ e2 e1
$ + id ⋗ )($ reduce
$+ ⋗ )($ reduce
$ blank )($ error, e2. unbalanced right parenthesis
$ ($ delete ‘)’from the INPUT
$ ⋖ ($ shift
$( blank $ error, e4. missing right parenthesis
$ $ pop ‘(’from STACK
$ $ accept
1
Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing : LR parsers
(C) 2014, Prepared by Partha Sarathi Chakraborty
Outline
• Bottom-Up Parsing
– LR parsers:
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
(C) 2014, Prepared by Partha Sarathi Chakraborty
3
LR Parsers
• Efficient bottom-up syntax analysis technique
that can used to parse a large class of CFG.
• The technique is called LR(k) parsing; ‘L’ is
for Left-to-Right scanning of the input, ‘R’ for
(C) 2014, Prepared by Partha Sarathi Chakraborty
LR Parsers: Attractive
• LR parsing is attractive for variety of reasons.
– LR parsers can be constructed to recognize virtually all
programming language constructs for which context-free
grammars can be written.
– The LR-parsing method is the most general nonbacktracking
shift-reduce parsing method known, yet it can be
(C) 2014, Prepared by Partha Sarathi Chakraborty
LR Parsers: Drawback
• The principal drawback of the LR method is that it is
too much work to construct an LR parser by hand for a
typical programming-language grammar.
• A specialized tool, an LR parser generator, is needed.
• YACC: Such a generator takes a context-free grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty
expensive.
• Lookahead LR (in short LALR) – It is intermediate
in power and cost between other two. It will work on
most programming-language grammars, and with
some effort, implemented efficiently.
• Powerful: Canonical LR > LALR > SLR
7
[A X • Y Z]
[A X Y • Z]
[A X Y Z •]
• Note that production A has one item [A •]
8
F
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id}
FOLLOW(E) = { $ , ) , + }
FOLLOW(T) = { * , $ , ) , + }
FOLLOW(F) = { * , $ , ) , + }
(C) 2014, Prepared by Partha Sarathi Chakraborty
F
• If I is the set of one item {[E’ → . E}, then closure(I) contains
the items.
10
Final Closure I0
(C) 2014, Prepared by Partha Sarathi Chakraborty
Items
The Goto Operation for LR(0)
11
12
+
I1 I6
E *
I0 I1 I2 I7
T E F
I4 I8 I4 I3
(C) 2014, Prepared by Partha Sarathi Chakraborty
I0 I2
( id
I4 I4 I4 I5
F
I0 I3 T F
I6 I9 I6 I3
( id
I6 I4 I6 I5
(
I0 I4 F ( id
I7 I10 I7 I4 I7 I5
)
I8 I11 *
+ I9 I7
I8 I6
14
Transition diagram for the grammar G
represent Goto operation
(C) 2014, Prepared by Partha Sarathi Chakraborty
expression grammar G
Parsing table SLR(1) for
16
17
Model of an LR Parser
18
(C) 2014, Prepared by Partha Sarathi Chakraborty
LR Parsing Algorithm
19
20
Example of LR parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty
Moves of LR parser on id * id + id
21
S S(S)
S
• FIRST(S) = { ( , }
• FOLLOW(S) = { $ , ( , ) }
(C) 2014, Prepared by Partha Sarathi Chakraborty
Closure Set I
Construction of LR(0) items:
22
23
LL vs. LR Grammars
• For a grammar to be LR(k), we must be able to
recognize the occurrence of the right side of a
production in a right-sentential form, with k input
symbols of lookahead.
• This requirement is far less stringent than that for
(C) 2014, Prepared by Partha Sarathi Chakraborty
=
I2 I6
S R *
I0 I1 I4 I7 I4 I4
(C) 2014, Prepared by Partha Sarathi Chakraborty
L id
I4 I8 I4 I5
L
I0 I2
R
I6 I9
R
I0 I3 L
I6 I8
*
I0 I4 *
I6 I4
id
I6 I5
27
Parsing table SLR(1) for expression
grammar X shows S-R conflict
Conflict
Grammar X,
(1) S L = R
(C) 2014, Prepared by Partha Sarathi Chakraborty
(2) S R
(3) L * R
(4) L id
(5) R L
Grammar X is not ambiguous. This shift/reduce conflict arises from the fact that the
SLR parser construction method is not powerful enough to remember enough left
context to decide what action the parser should take on input =, having seen a string
reducible to L.
28
LR(1) Grammars
RL
30
LR(1) Items
• An LR(1) item
[A•, a]
contains a lookahead terminal a, meaning already
on top of the stack, expect to see a
• For items of the form
[A•, a]
(C) 2014, Prepared by Partha Sarathi Chakraborty
• Augment with S’ S
(C) 2014, Prepared by Partha Sarathi Chakraborty
A•B , a
Core / First Component
FIRST(a)
33
Construct Closure I0
Core lookahead
FIRST(a)
I0 rewrite as:
(C) 2014, Prepared by Partha Sarathi Chakraborty
34
grammar X
Canonical LR(1) parsing table for
36
37
LALR(1) Grammars
• LR(1) parsing tables have many states
• LALR(1) parsing (Look-Ahead LR) combines LR(1)
states to reduce table size
• Less powerful than LR(1)
– Will not introduce shift-reduce conflicts, because shifts do
(C) 2014, Prepared by Partha Sarathi Chakraborty
grammar X
LALR(1) parsing table for
41
42
Analysis
• The LR and LALR parsers will mimic one
another on correct inputs.
• When presented with erroneous input, the
LALR parser may proceed to do some
(C) 2014, Prepared by Partha Sarathi Chakraborty
S’ S
S S(S)
S
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty
Parsing canonical LR
47
(C) 2014, Prepared by Partha Sarathi Chakraborty
LALR parsing
48
49
FOLLOW(S) = { $ , c , d }
S’ S
S CC
C cC
Cd
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty
I36: C cC , c | d | $
C cC , c | d | $
C d , c | d | $
I47: C d , c | d | $
I89: C cC , c | d | $
(C) 2014, Prepared by Partha Sarathi Chakraborty
Assignment
Consider the following grammar G,
EE+T|T
T TF | F
F F* | a | b
(C) 2014, Prepared by Partha Sarathi Chakraborty