COP CD Unit2 PDF

UE20CS353: Compiler Design
Chapter 4: Syntax Analysis
1. The role of the Parser

2. Error-Recovery Strategies
3. Introduction to different parsers.
4. Top-Down parsing
5. Bottom-Up parsing
Mr. Prakash C O
Asst. Professor,
Dept. of CSE, PESU
coprakasha@pes.edu
Syntax Analysis
➢ By design, every programming language has precise rules that

prescribe the syntactic structure of well-formed programs.
➢ In C, for example, a program is made up of functions, a function

out of declarations and statements, a statement out of expressions,
and so on.
➢ The syntax of programming language constructs can be specified

by context-free grammars or BNF notation.
➢ Grammars offer significant benefits for both language designers

and compiler writers.
Syntax Analysis
and compiler writers.
1. A grammar gives a precise, yet easy-to-understand, syntactic

specification of a programming language.
2. From certain classes of grammars, we can construct automatically

an efficient parser that determines the syntactic structure of a
source program.
As a side benefit, the parser-construction process can reveal

syntactic ambiguities and trouble spots that might have slipped
through the initial design phase of a language.
Syntax Analysis

and compiler writers. Cont…
3. The structure imparted to a language by a properly designed

grammar is useful for translating source programs into correct object
code and for detecting errors.
4. A grammar allows a language to be evolved or developed

iteratively, by adding new constructs to perform new tasks.
These new constructs can be integrated more easily into an
implementation that follows the grammatical structure of the
language.
The Role of the Parser
➢ The parser obtains a string of tokens from the lexical analyzer, as shown in
Fig. 4.1, and verifies that the string of token names can be generated by the
grammar for the source language.
➢ The Parser should
1. Report any syntax errors in an intelligible fashion.
2. Recover from commonly occurring errors to continue

processing the remainder of the program.
➢ The parser constructs a parse tree for well-formed programs and

passes it to the rest of the compiler for further processing.
➢ There are three general types of parsers for grammars:
1. Universal,
2. Top-down, and
3. Bottom-up.
➢ Universal parsing methods such as the Cocke-Younger-

Kasami(CYK) algorithm and Earley's algorithm can parse any
grammar.
These general methods are, however, too inefficient to use in
production compilers.
➢ The parsing methods commonly used in compilers can be
classified as being either top-down or bottom-up.
➢ Top-down methods build parse trees from the top (root) to the
bottom (leaves),
➢ Bottom-up methods build parse trees from the leaves and work their
way up to the root.
➢ In either case, the input to the parser is scanned from left to right, one
symbol at a time.
Fig: Parse Tree
Grammar:
LL grammar and LR grammar
➢ The LL and LR grammars describe most of the syntactic constructs in

modern programming languages.
LL grammar: LR grammar:
E → T E' E → E + T | T
E' → + T E'| ϵ T → T * F | F
T → F T' F → (E) | id
T' → * F T'| ϵ
LL grammar and LL parser
➢ An LL grammar is a context-free grammar that can be parsed by an LL parser.
➢ The LL parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The second L in LL means that the parser produces a Leftmost derivation: it
does a top-down parse.
➢ Top-down parser build parse trees from the top (root) to the bottom (leaves).
➢ LL parser, also known as top-down parser.
➢ LL parsers are often called “predictive parsers”

LR grammar and LR parser
➢ An LR grammar is a context-free grammar that can be parsed by an LR parser.
➢ The LR parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The R means that the parser produces a Rightmost derivation in reverse: it does
a bottom-up parse.
➢ Bottom-up parser build parse trees from the leaves and work their way up to
the root.
➢ LR parser, also known as bottom-up parser.
➢ LR parsers are often called “shift-reduce” parsers.

LL parser LR parser
LL parser begin at the start symbol and try to apply LR parser begin at the target string and try to
productions to arrive at the target string arrive back at the start symbol.
LL parsing, also known as top-down parsing LR parsing, also known as bottom-up parsing
LL starts with only the root non terminal on the LR ends with only the root non terminal on the
stack stack.
LL uses grammar rules in an order which LR does a post-order traversal.

corresponds to pre-order traversal of the parse
tree
LL continuously pops a non terminal off the stack, LR tries to recognize a right hand side on the
and pushes a corresponding right hand side stack, pops it, and pushes the corresponding
non terminal.
LL reads terminal when it pops one off the stack LR reads terminals while it pushes them on the
stack.
LL parsers are often called “predictive parsers” LR parsers are often called “shift-reduce parsers.
➢ Parsers implemented by hand often use LL grammars;

for example, the predictive-parsing approach works for LL
grammars.
➢ Parsers for the larger class of LR grammars are usually

constructed using automated tools.
Representative Grammars
➢ Constructs that begin with keywords like if, for, while or int , are
relatively easy to parse, because the keyword guides the choice
of the grammar production that must be applied to match the
input.
➢ In this chapter, we concentrate on expressions, which present

more of challenge, because of the associativity and precedence
of operators.
➢ Associativity and precedence are captured in the following grammar.
➢ E represents expressions consisting of terms separated by + signs,

➢ T represents terms consisting of factors separated by * signs, and
➢ F represents factors that can be either parenthesized expressions

or identifiers:
E -> E + T | T
T -> T * F | F (4.1)
F -> (E) | id
The above expression grammar belongs to the class of LR grammars
that are suitable for bottom-up parsing.
This grammar can be adapted to handle additional operators and
additional levels of precedence.
➢ The following non-left-recursive variant of the expression grammar (4.1) will
be used for top-down parsing:
E → T E' E -> E + T | T
E' → + T E'| ϵ T -> T * F | F (4.1)

(4.2) F -> (E) | id
T → F T’
T' → * F T'| ϵ
➢ The following grammar(4.3) treats + and * alike, so it is useful for illustrating

techniques for handling ambiguities during parsing:
E → E + E | E * E | ( E ) | id (4.3)
Here, E represents expressions of all types. Grammar (4.3) permits more than
one parse tree for expressions like a + b*c.
Syntax Analysis
Syntax Error Handling

➢ A good compiler should assist a programmer in identifying and
locating errors.
Few languages have been designed with error handling in mind, even
though errors are so commonplace.
➢ Most programming language specifications do not describe how a

compiler should respond to errors; error handling is left to the compiler
designer.
➢ Planning the error handling right from the start can both simplify the
structure of a compiler and improve its handling of errors.
Common programming errors can occur at many different levels.
➢ Lexical errors: Lexical error is a sequence of characters that does not

match the pattern of any token. Lexical errors include
• Missing quotes around text intended as a string. (Unmatched string)
• Appearance of illegal characters
➢ Syntactic errors include

• misplaced semicolons or
• extra or missing braces; that is, "{" or " } "
• the appearance of a case statement without an enclosing switch in C or

Java,.
Common programming errors can occur at many different levels. Cont…
➢ Semantic errors include
➢ type mismatches between operators and operands.
➢ Example: the return of a value in a Java method with result type void.
➢ Logical errors include
➢ the use of the assignment operator = instead of the comparison operator ==
in a C program. The program containing = may be well formed; however, it
may not reflect the programmer's intent.

➢ The precision of parsing methods allows syntactic errors to be
detected very efficiently.
➢ The LL and LR parsing methods, detect an error as soon as possible;

that is, when the stream of tokens from the lexical analyzer cannot be
parsed further according to the grammar for the language.
➢ The viable-prefix property of parsers allows early detection of
syntax errors.
Viable-prefix Property
➢ The viable-prefix property of parsers allows early detection of

syntax errors
➢ Goal: detection of an error as soon as possible without further consuming
unnecessary input
➢ How: detect an error as soon as the prefix of the input does not match a
prefix of any string in the language

➢ The goals of error handler in a parser are simple to state but
challenging to realize:
1. Report the presence of errors clearly and accurately.
2. Recover from each error quickly enough to detect subsequent
errors.
3. Add minimal overhead to the processing of correct programs.

➢ How should an error handler report the presence of an error?
➢ At the very least, it must report the place in the source program where an
error is detected, because there is a good chance that the actual error
occurred within the previous few tokens.
➢ A common strategy is to print the offending line with a pointer to the position
at which an error is detected.

Syntax Analysis
Error-Recovery Strategies
➢ Once an error is detected, how should the parser recover/react?
1. Stop immediately and signal an error .
2. Record the error but try to continue.
In the first case, the user must recompile from scratch after possibly a
trivial fix.
In the second case, the user might be overwhelmed by a whole series of
error messages, all caused by essentially the same problem.
➢ We will talk about how to do error recovery in a principled way.

➢ Error recovery:
➢ The process of adjusting input stream so that the parser can

continue after unexpected input
➢ Possible adjustments:
➢ delete tokens
➢ insert tokens
➢ substitute tokens
➢ Error recovery is possible in both top-down and bottom-up parsers

1. Panic-Mode Recovery
2. Phrase-Level Recovery
3. Error Productions
4. Global Correction
➢ In this method, on discovering an error, the parser discards input

symbols one at a time until one of a designated set of synchronizing
tokens is found.
➢ The synchronizing tokens are usually delimiters, such as ; or }, whose

role in the source program is clear and unambiguous.
➢ Disadvantage is that a considerable amount of input is skipped

without checking it for additional errors.
➢ Advantage is that its easy to implement and guarantees not to go to

infinite loop.
➢ In case of an error like:
1) a = b + c // no semi-colon
d = *e + f ;
2) int =10;
The compiler will discard all subsequent tokens till a semi-colon is

encountered.
This is a crude method but often turns out to be the best method.
In situations where multiple errors in the same statements are rare, this
method may be quite adequate.
2. Phrase-Level Recovery (Statement Mode recovery)
➢ On discovering an error, a parser may perform local correction on the

remaining input; that is, it may replace a prefix of the remaining input by
some string that allows the parser to continue.
➢ A typical local correction is to

• replace a comma by a semicolon, E.g.: scanf(“%d”, &x), replace , with ;
• delete an extraneous semicolon or brace, E.g.: {… {…} …}}
• insert a missing semicolon, E.g.: scanf(“%d”, &x)
➢ The choice of the local correction is left to the compiler designer.

We must be careful to choose replacements that do not lead to
infinite loops.
3. Error Productions
➢ If user/designer has knowledge of common errors that can be

encountered, then these errors can be incorporated by augmenting the
grammar with error productions that generate erroneous constructs.
➢ This augmented grammar (CFG+ error productions ) detects the

anticipated errors when an error production is used during parsing and
parsing can be continued.
➢ The parser can then generate appropriate error diagnostics about the
erroneous construct that has been recognized in the input.
3. Error Productions Cont…
➢ If we have an idea of common errors that might occur, we can include the
errors in the grammar at hand.
For example if we have a production rule like:
E → +E|-E Then, a=+b; a=-b; a=*b; a=/b;
Here, the last two are error situations. Now, we change the grammar as:
E → +E | -E | *A | /A A → E
Hence, once it encounters *A, it sends an error message asking the user if
he is sure he wants to use a unary “*”.
If this is used then, during parsing appropriate error messages can be generated and
parsing can be continued.
4. Global Correction
➢ In this approach, the compiler should make as few changes as possible in

processing an incorrect input string.
There are algorithms for choosing a minimal sequence of changes to
obtain a globally least-cost correction.
➢ Given an incorrect input string x and grammar G, these algorithms will find
a parse tree for a closest error-free string y, such that the number of
insertions, deletions, and changes of tokens required to transform x into y is
as small as possible.
➢ These methods are in general too costly to implement in terms of time and
space, so these techniques are currently only of theoretical interest.
Syntax Analysis
Context-Free Grammars
➢ Grammars are used to specify the syntax of a language.
➢ A grammar naturally describes the hierarchical structure of most

programming language constructs.
➢ For example, an if-else statement in Java can have the form

if ( expression ) statement else statement
Using the variable expr to denote an expression and the variable

stmt to denote a statement, this structuring rule can be
expressed as
stmt —> if ( expr ) stmt else stmt

Components of Context-Free Grammar
➢ Set of terminal symbols
➢ Set of nonterminals
➢ Set of productions
➢ The head is nonterminal
➢ The body is a sequence of teminals and/or nonterminals
➢ Designation of one nonterminal as starting symbol

➢ Production rules.
➢ Example 1:
➢ Example 2:
What does this grammar generate?

Some Definitions
➢ String of terminals: sequence of zero or more terminals
➢ Derivation:
➢ given the grammar (i.e. productions)
➢ begin with the start symbol
➢ repeatedly replacing nonterminal by the body
➢ We obtain the language defined by the grammar (i.e. group of
terminal strings)
➢ Example:
How to derive: 9-5+7 from the above rules?

Some Definitions
➢ Parsing:
 Given a string of terminals
 Figure out how to derive it from the start symbol of the grammar
 If it cannot be derived from the start symbol of the grammar, then

reporting syntax errors within the string.
➢ Parsing is the process of determining how a string of terminals can

be generated by a grammar.
Parse Tree
 A parse tree pictorially shows how the start symbol of a grammar

derives a string in the language.
 A parse tree according to the grammar is a tree with the following

properties:
1. The root is labeled by the start symbol.
2. Each leaf is labeled by a terminal or by ϵ.
3. Each interior node is labeled by a nonterminal.

Parse Tree
➢ Example: Figure 2.5: Parse tree for 9-5+2

Parse Tree
➢ Example: Figure 2.5: Parse tree for 9-5+2

➢ Ambiguity
➢ A grammar that produces more than one parse tree for some
sentence is said to be ambiguous.
➢ E → E + E | E * E | ( E ) | id (4.3)
➢ The arithmetic expression grammar (4.3) permits two distinct

leftmost derivations for the sentence id + id * id:
Associativity of Operators
➢ How will you evaluate this?
9-5-2
▪ Will ‘5’ go with the ‘-’ on the left or the one on the right?
▪ If it goes with the one on the left: (9-5)-2 we say that the operator ‘-’ is
left-associative
▪ If it goes with the one on the right: 9-(5-2) we say that the operator ‘-’ is
right-associative
Associativity of Operators
➢ How to express associativity in production rules?

Precedence of Operators
 Associativity applies to occurrence of the same operator
 What if operators are different?
 How will you evaluate: 9-5*2
 We say ‘*’ has higher precedence than ‘-’ if it takes its operands before
‘-’
 How to present this in productions?

Leftmost and Rightmost Derivations
➢ In leftmost derivations, the leftmost nonterminal in each sentential is always

chosen. If is a step in which the leftmost nonterminal in is replaced, we
write
➢ In rightmost derivations, the rightmost nonterminal is always chosen; we write

in this case.
Context-Free Grammar Vs Regular Expressions
➢ Grammars are more powerful notations than regular expressions
 Every construct that can be described by a regular expression can be

described by a grammar, but not vice-versa
➢ NFA to CFG
(a|b)*abb
Question Worth Asking
➢ If grammars are much powerful than regular expressions, why not

using them in lexical analysis too?
• Lexical rules are quite simple and do not need notation as

powerful as grammars
• Regular expressions are more concise and easier to understand

for tokens
• More efficient lexical analyzers can be generated from regular

expressions than from grammars
➢ How Can We Enhance Our Grammar?
1. Eliminating ambiguity
▪ Re-write grammar to eliminate ambiguity.
2. Eliminating left-recursion
▪ Top-down parser cannot handle left-recursive grammars. During
parsing, it is possible for a top down parser to loop forever, So a
transformation is needed to eliminate left recursion.
▪ A grammar is said to be "left-recursive“, if the leftmost symbol of the

body is same as the nonterminal at the head of the production.
(e.g.: E → E + T)
3. Left factoring
▪ Elimination of common prefixes.
Eliminating Ambiguity
 Sometimes we can re-write grammar to eliminate ambiguity

Ambiguous Grammar: Input string:
Note: When there is multiple IF and a single ELSE then the ELSE part doesn't get a clear view to go with which IF, this
problem is called dangling else problem.
Two different parse trees for the input string:

Input string: (4.15)
Ambiguous Grammar:
(4.14)
Unambiguous Grammar:
This grammar associates

each else with the closest
previous unmatched then.
Fig. 4.10
(4.15)
Fig. 4.10
(4.14)
The idea is that a statement appearing between a then and an else must be "matched" ;
that is, the interior statement must not end with an unmatched or open then.
A matched statement is either an if-then-else statement containing no open statements
or it is any other kind of unconditional statement.
Thus, we may use the grammar in Fig. 4.10. This grammar generates the same strings as
the dangling-else grammar (4.14), but it allows only one parsing for string (4.15); namely,
the one that associates each else with the closest previous unmatched then.
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.
This ambiguity disappears if we use the productions
▪ A → AB | B
B → α
or
▪ A → BA | B
B → α
Eliminating Left-Recursion
➢ A grammar is left recursive if it has a nonterminal A such that there

is a derivation for some string
➢ Top-down parsing methods cannot handle left-recursive

grammars, so a transformation is needed to eliminate left
recursion.
➢ Example:
➢ Example 1:
➢ Example 2: (Indirect left recursion elimination)
A → Bxy | x A → Bxy | x A → Bxy | x

B → CD B → CD B → CD
C→A|c C → Bxy | x | c C → CDxy | x | c
D→d D→d D→d
➢ Example 1: E → T E'
E' → + T E'| ϵ
T → F T’
T' → * F T’| ϵ
F → (E)| id
➢ Example 2: (Indirect left recursion elimination)
A → Bxy | x A → Bxy | x A → Bxy | x A → Bxy | x
B → CD B → CD B → CD B → CD
C→A|c C → Bxy | x | c C → CDxy | x | c C → xC' | cC'
D→d D→d D→d C' → DxyC' | ϵ

D→d
Left Factoring (Elimination of common prefixes)
 If a grammar contains two productions of form S→ aα and S → aβ

it is not suitable for top-down parsing without backtracking.
 Troubles of this form can sometimes be removed from the grammar

by a technique called the left factoring.
 Left factoring is a grammar transformation that is useful for producing a grammar

suitable for predictive, or top-down, parsing.
 When the choice between two alternative A-productions is not clear, we may be
able to rewrite the productions to defer the decision until enough of the input has
been seen that we can make the right choice.
In general, A → αβ1| αβ2 are two A-productions, and the input begins with a
nonempty string derived from α.
We do not know whether to expand A to αβ1 or αβ2. However, we may defer the
decision by expanding A to αA'. Then, after seeing the input derived from α, we
expand A' to β1 or β2
 Left factoring is a process by which the grammar with common prefixes is

transformed to make it useful for Top down parsers.
 How?
 In left factoring,
▪ We make one production for each common prefix.
▪ The common prefix may be a terminal or a non-terminal or a combination of

both.
▪ Rest of the derivation is added by new productions.
 The grammar obtained after the process of left factoring is called as Left
Factored Grammar.
 Important Note:
▪ During left most derivation, when the choice between two alternative
A-productions is not clear, the right choice of A-production selection needs k
symbols lookahead on the input.
▪ Left Factoring (Elimination of common prefixes) avoids k-symbols lookahead on

the input.
▪ Left Factoring converts k-symbols lookahead on the input to 1-symbol lookahead

(that is LL(k) to LL(1)).
▪ Left factored grammar avoids backtracking in Recursive Descent Parsing.

Left Factoring
 Example:
Left Factoring
 Example:
 Do left factoring in the following grammars-
1. A → aAB / aBc / aAc
2. S → bSSaaS | bSSaSb | bSb | a
3. S → aSSbS | aSaSb | abb | b
4. S → a | ab | abc | abcd
5. S → aAd | aB A → a | ab B → ccd | ddc

1. A → aAB | aBc | aAc

1. A → aAB | aBc | aAc
Step-01:
A → aA'
A' → AB | Bc | Ac
Again, this is a grammar with common prefixes.
Step-02:
A → aA'
A' → AD | Bc
D → B | c
This is a left factored grammar.


Step-01:
S → bSS' / a
S' → SaaS / SaSb / b
Step-02:
S → bSS' / a
S' → SaA / b
A → aS / Sb

3. S → aSSbS / aSaSb / abb / b

3. S → aSSbS / aSaSb / abb / b
Step-01:
S → aS' / b
S' → SSbS / SaSb / bb
Step-02:
S → aS' / b
S' → SA / bb
A → SbS / aSb

4. S → a / ab / abc / abcd
4. S → a / ab / abc / abcd
Step-01:
S → aS'
S' → b / bc / bcd / ∈
Step-02: Step-03:
S → aS' S → aS'
S' → bA / ∈ S' → bA / ∈
A → c / cd / ∈ A → cB / ∈
Again, this is a grammar with B → d / ∈
common prefixes. This is a left factored grammar.
5. S → aAd / aB A → a / ab B → ccd / ddc

5. S → aAd / aB A → a / ab B → ccd / ddc
The left factored grammar is
S → aS'
S' → Ad / B
A → aA'
A' → b / ∈
B → ccd / ddc
Parsing
Top-Down Parsing
Top-Down Parsing
 Top-down parsing can be viewed as
▪ The problem of constructing a parse tree for the input string, starting from the
root and creating the nodes of the parse tree in preorder (depth-first).
 Top-down parsing can also be viewed as

▪ Finding a leftmost derivation for an input string.
 At each step of a top-down parse:
▪ The key problem is that of determining the production to be applied for a

nonterminal, say A.
▪ Once an A-production is chosen, the rest of the parsing process consists of

"matching” the terminal symbols in the production body with the input
string.
Top-Down Parsing
 Show the sequence of parse trees for the input id+id*id
Grammar:
Top-Down Parsing
 Show the sequence of parse trees for the input id+id*id
E⇒ TE'
⇒ FT'E'
⇒ idT'E'
⇒ idE'
⇒ id+TE'
⇒ id+FT'E'
⇒ id+idT'E'
⇒ id+id*FT'E'
⇒ id+id*idT'E'
⇒ id+id*idE'
⇒ id+id*id
Top-Down Parsing
Grammars should be free from
left recursion and ambiguities.
Table driven
Enhanced Grammar free from

• Ambiguity
• Left recursion
• Common prefixes
Grammars are not left Grammars are left factored to

factored avoid backtracking
Top-Down Parsing
Recursive Descent Parsing
 Recursive descent is a top-down parsing technique that constructs the

parse tree from the top and the input is read from left to right.
 A recursive descent parser consists of several small functions/procedures,

one for each nonterminal in the grammar.
 This parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking. But the grammar associated
with it (if not left factored) cannot avoid back-tracking.
 RDP can be used to parse different types of code such as XML or other
inputs.
Top-Down Parsing
Recursive-Descent Parsing
 Recursive-descent parsing(RDP) is one of the simplest parsing

techniques that is used in practice.
▪ The basic idea is to associate each non-terminal with a procedure.
(A RDP consists of several small functions, one for each nonterminal in the grammar.)
▪ The goal of each such procedure is to
1. read a sequence of input characters that can be generated by the

corresponding non-terminal, and
2. return a pointer to the root of the parse tree for the non-terminal.
▪ The structure of the procedure is dictated by the productions for the

corresponding non-terminal.
Top-Down Parsing
 The RDP procedure attempts to "match" the right hand side of

some production for a non-terminal.
 To match a terminal symbol, the procedure compares the terminal
symbol to the input; if they agree, then the procedure is successful,
and it consumes the terminal symbol in the input (that is, moves the
input cursor over one symbol).
 To match a non-terminal symbol, the procedure simply calls the

corresponding procedure for that non-terminal symbol (which may be
a recursive call, hence the name of the technique).
Top-Down Parsing
Recursive-Descent Parsing (with backtracking)
 RDP execution begins with the procedure for the start symbol,
which halts and announces success if its procedure body scans
the entire input string.
 RDP with backtracking determines which production to use by

trying each production in turn.
 RDP with backtracking cannot work with left recursive grammars

because it would cause the program to enter an infinite loop.
Consider the grammar rule A -> Aa | b
Procedure A() would roughly have the following pseudocode: void A() { A(); … }
Top-Down Parsing
 Each non-terminal corresponds to a procedure.
 Example: A → aBb (This is only the production rule for A)
proc A {
Match the current token with a, and move to the next token;
Call ‘B’;
Match the current token with b, and move to the next token;
}
Top-Down Parsing
 A → aBb| bAB and B → c input: acb
procA { Non backtracking RDP procedure for left factored grammar with one symbol lookahead.
case (current token/current input symbol) {
‘a’: Match the current token with a, and move to the next token;
Call ‘B’;
Match the current token with b, and move to the next token;
‘b’: Match the current token with b, and move to the next token;
Call ‘A’;
Call ‘B’;
}
Top-Down Parsing
 When to apply ε-productions.

A → aA| bB| ε
 If all other productions fail, we should apply an ε-production.

For example, if the current token is not a or b, we may apply the
ε-production.
 Most correct choice: We should apply an ε-production for a

non-terminal A when the current token is in the follow-set of A
(which terminals can follow A in the sentential forms).
Top-Down Parsing
A → aBe| cBd| C
B → bB| ε
Non backtracking RDP procedure for left factored grammar with one symbol lookahead.
C→f
proc A {
case (current token/current input symbol) {
a: Match the current token with a, and move to the next token;
Call B;
Match the current token with e, and move to the next token;
c: Match the current token with c, and move to the next token;
Call B;
Match the current token with d, and move to the next token;
f: Call C:
}
}
proc C {
Match the current token with f, and move to the next token;
}
proc B
{
case (current token/current input symbol){
b: Match the current token with b, and move to the next token;
Call B
e,d: do nothing
}
Top-Down Parsing
Example of Backtracking
 Based on the information the parser currently has about the input,
▪ A decision is made to go with one particular production.
▪ If this choice leads to a dead end, the parser would have to backtrack
to that decision point, moving backwards through the input, and
start again making a different choice and so on until it either found the
production that was the appropriate one or ran out of choices.
 For example, consider this simple grammar:

S → bab | bA (Grammar is not left factored, backtracking is required)
A → d | cA
Note: Writing RDP procedures for RDP with Backtracking(for non left factored
grammar) is very complex.
Top-Down Parsing
(Grammar is not left
S → bab | bA
Example of Backtracking A → d | cA
factored, backtracking is
required)
( Note: with no symbol lookahead, i.e., LL(0) )
 Let’s follow parsing the input bcd. Expansion Input Action

so far
S bcd Try S –> bab

 As you can see, each time we hit
bab bcd match b
a deadend, we backup to the last
decision point, unmake that bab bcd dead-end, backtrack
decision and try another S bcd Try S –> bA
alternative. bA bcd match b

bA bcd Try A –> d
 If all alternatives have been
bd bcd dead-end, backtrack
exhausted, we back up to the
bA bcd Try A –> cA
preceding decision point and so
bcA bcd match c
on. This continues until we either
bcA bcd Try A –> d
find a working parse or have
bcd bcd match d, Success!
exhaustively tried all combinations
without success.
Top-Down Parsing
(Grammar is not left
S → bab | bA
Example of Backtracking A → d | cA
factored, backtracking is
required)
(Note: with one symbol lookahead, i.e., LL(1)
 Let’s follow parsing the input bcd. Expansion Input Action

so far
S bcd Try S –> bab

 As you can see, each time we hit
bab bcd match b
a deadend, we backup to the last
decision point, unmake that bab bcd dead-end, backtrack
decision and try another S bcd Try S –> bA
alternative. bA bcd match b

bcA bcd Expand A with rule
 If all alternatives have been A→cA by doing one
symbol lookahead on
exhausted, we back up to the
input.
preceding decision point and so
bcA bcd match c
on. This continues until we either
bcA bcd Expand A with rule A→d
find a working parse or have by doing one symbol
exhaustively tried all combinations lookahead on input.
without success. bcd bcd match d, Success!
Top-Down Parsing
(Grammar is not left factored,
 Example of Backtracking backtracking is required)
( Note: with no symbol lookahead, i.e., LL(0) )
 Input: cad Expansion Input Action

so far
 Grammar: S cad Try S –> cAd
cAd cad match c
Top-Down Parsing
 Demonstrate RDP with Backtracking for the given input
and grammar.
▪ Input: w = aaba
▪ Grammar:
(Grammar is not left factored, backtracking is required)

A → abC|aBd|aAD
B → bB | ϵ
C → d | ϵ
D → a | b | ϵ
Top-Down Parsing
 Consider the language defined by grammar S -> aSa | aa , which
ideally accepts L(G) = { a2n, n>=1 }
 Show the working of RDP with backtracking for the following

input strings:
• aa
• aaaa
• aaaaaa
• aaaaaaaa
Top-Down Parsing
 Demonstrate RDP with Backtracking for the given input
and grammar.
 Input: w=read
 Grammar:
S → rXd | rZd (Grammar is not left factored, backtracking is required)
X → oa | ea
Z → ai
Top-Down Parsing
 Example of RDP with Non-Backtracking
 Input: cad
 Grammar(Left factored):
Expansion Input Action
S → cAd so far
S cad Start with S –> cAd

A → aB
cAd cad match c
B → b | ϵ caBd cad Expand A with rule A→aB
caBd cad match a
Note: Even though the Grammar is left factored,
we need one symbol lookahead on the input
cad cad Expand B with rule B→ϵ by
doing one symbol
(when 2 or more rhs for a non-terminal, for
lookahead on input.
example B → b|ϵ) to choose a right production
to eliminate completely backtracking.
cad cad match d, Success!
Top-Down Parsing
First and Follow Functions

Top-Down Parsing
FIRST and FOLLOW
 The construction of both top-down and bottom-up parsing is aided

by two functions, FIRST and FOLLOW, associated with a grammar G.
 During topdown parsing,

FIRST and FOLLOW allow us to choose which production to apply,
based on the next input symbol.
 During panic-mode error recovery,

sets of tokens produced by FOLLOW can be used as synchronizing
tokens.
Top-Down Parsing
FIRST
S → ABCDE
S a b c d e ϵ
A → a | ϵ
FIRST and FOLLOW B → b | ϵ
A
B
a ϵ
b ϵ
C → c | ϵ
C c ϵ
D → d | ϵ
D d ϵ
E → e | ϵ
E e ϵ
Top-Down Parsing
FIRST and FOLLOW
Example: FIRST(S) = FIRST(ABCDE)

.
S → ABCDE .
A → a | ϵ FIRST
. S
B → b | ϵ
A
C → c | ϵ
B
D → d | ϵ C
E → e | ϵ FIRST(E) = FIRST(e) and { ϵ } D
= { e } and { ϵ } = { e ϵ} E
Top-Down Parsing
FIRST and FOLLOW
Example:
FIRST
S → ABCDE S a b c d e ϵ
A → a | ϵ A a ϵ
B → b | ϵ B b ϵ
C → c | ϵ C c ϵ
D → d | ϵ D d ϵ
E → e | ϵ E e ϵ
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST
E
E'
T
T'
F
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id
Top-Down Parsing
FIRST and FOLLOW
Top-Down Parsing
FIRST and FOLLOW
FIRST FOLLOW
E ( id $ )
E' + ϵ $ )
T ( id +$)
T' * ϵ +$)
F ( id *+$)
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
Find FIRST and FOLLOW sets of each non-terminal in the grammar.
Note: Є as a FOLLOW doesn’t mean anything (Є is an empty string).

Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST FOLLOW
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id

Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST FOLLOW
E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise -1:

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise -1:
FIRST FOLLOW
S
U
V
W

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise -1:
FIRST FOLLOW
S u y z wx
U u y z ϵ
V w x ϵ
W y z

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise -1:
FIRST FOLLOW
S u y z wx $
U u y z ϵ w xy z
V w x ϵ y z
W y z v $

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 2:
FIRST FOLLOW
S
A
B
C
D
E

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 2:
FIRST FOLLOW
S a b c
A a ϵ
B b ϵ
C c
D d ϵ
E e ϵ

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 2:
FIRST FOLLOW
S a b c $
A a ϵ b c
B b ϵ c
C c d e $
D d ϵ e $
E e ϵ $

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 3:
FIRST FOLLOW
S
B
C

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 3:
FIRST FOLLOW
S a c b d
B a ϵ
C c ϵ

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 3:
FIRST FOLLOW
S a c b d $
B a ϵ b
C c ϵ d

Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
❑ Exercise - 4:

Top-Down Parsing
derive.
❑ Exercise - 4: FIRST FOLLOW

S d g h ϵ b a
A d g h ϵ
B g ϵ
C h ϵ
First(C) = First(h) and {ϵ} = {h} and {ϵ} = {h ϵ}

First(B) = First(g) and {ϵ} = {g} and {ϵ} = {g ϵ}
First(A) = First(da) and First(BC) = {d} and {g h ϵ} = {d g h ϵ}
First(S) = First(ACB) and First(CbB) and First(Ba)
= {d g h ϵ} and {h b} and {g a}
= {d g h ϵ b a}
Top-Down Parsing
derive.

S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh
Follow(S) = { $ } If S is the start symbol place $ in Follow(S)

Follow(A) = Non-epsilon symbols of First(CB) and if First(CB)
contains epsilon then Follow(A) = Follow(S) also.
Follow(A) = { h g } and { $ }
Top-Down Parsing
derive.

S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh

Follow(A) = {h g $}
Follow(B) = Follow(S) and First(a) and First(C) and if First(C)
contains epsilon then Follow(B) = Follow(A) also.
Follow(B) = { $ } and { a } and { h g $ } = { $ a h g }
Top-Down Parsing
derive.

S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh

Follow(A) = {h g $}
Follow(B) = { $ a h g }
Follow(C) = { g $ } and { b } and { h g $ } = { g $ b h }
Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 5:
S -> L=R | R FIRST FOLLOW

R -> L S
L -> *R | id R
L

Top-Down Parsing
FIRST and FOLLOW
❑ Exercise - 5:
S -> L=R | R FIRST FOLLOW

R -> L S * id $
L -> *R | id R * id = $
L * id = $

Top-Down Parsing
Why FIRST and FOLLOW?
❑ FIRST and FOLLOW help us to pick a rule when we have a choice

between two or more r.h.s. by predicting the first symbol that each
r.h.s. can derive.
❑ Even if there is only one r.h.s. we can still use them to tell us
whether or not we have an error - if the current input symbol
cannot be derived from the only r.h.s. available, then we know
immediately that the sentence does not belong to the grammar,
without having to (attempt to) finish the parse.
Top-Down Parsing
Why FIRST in Compiler Design?
If the compiler would have come to know in advance, that what is the “first
character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can wisely
take decision on which production rule to apply.
S -> cAd A -> bc|a Input: cad
Thus, in the example above, if it knew that after reading character ‘c’ in the input
string and applying S->cAd, next character in the input string is ‘a’, then it would
have ignored the production rule A->bc (because ‘b’ is the first character of the
string produced by this production rule, not ‘a’ ), and directly use the production
rule A->a
Top-Down Parsing
LL(1) Grammar & Predictive

Parsing
Top-Down Parsing
LL(1) Grammars
 Predictive parsers, can be constructed for a class of grammars called LL(1).
 In LL(1):
▪ the first "L“ stands for scanning the input from left to right,
▪ the second "L" for producing a leftmost derivation, and
▪ the "1" for using one input symbol of lookahead at each step to make parsing
action decisions.
 The class of LL(1) grammars is rich enough to cover most programming

constructs, although care is needed in writing a suitable grammar for the
source language.
For example, no left-recursive or ambiguous grammar can be LL(1).
Top-Down Parsing
LL(1) Grammars
 A grammar G is LL(1) if and only if whenever A → F|T are two distinct

productions of G, the following conditions hold:
1. FIRST(F) and FIRST(T) are disjoint
2. if ϵ is in FIRST(T) then FIRST(F) and FOLLOW(A) should be disjoint and

if ϵ is in FIRST(F) then FIRST(T) and FOLLOW(A) should be disjoint.
 Justify whether the grammars are LL(1) or not

➢ G1: G2: G3:
S → Aa S → Aa S → A|xb
A →bAb|ϵ A →Abb|ϵ A →aAb|B
B →x
Top-Down Parsing
LL(1) Grammars
 Justify whether the grammars are LL(1) or not
➢ G1: G2: G3:
S → Aa S → Aa S → A|xb
A →bAb|ϵ A →Abb|ϵ A →aAb|B
B →x
G1: is not LL(1) First(bAb)={b} Follow(A)={a b}
G2: is not LL(1) First(Abb)={b} Follow(A)={a b}
G3: is not LL(1) First(A)=First(aAb) and First(B) First(xb)={x}

First(A)={a} and {x} = {a x}
Top-Down Parsing
LL(1) Grammars
 If no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, then grammar is

LL(1).
 An example of a FIRST/FIRST conflict:
o S → Xb | Yc
o X → a
o Y → a
❑ By seeing only the first input symbol a, you cannot know whether to apply the
production S → Xb or S → Yc, because a is in the FIRST set of both X and Y.
First(Xb) = {a} FIRST(Xb) and FIRST(Yc) are not

First(Yc) = {a} disjoint, the grammar is not LL(1).
Top-Down Parsing
LL(1) Grammars
❑ An example of a FIRST/FOLLOW conflict:

o S → AB
o A → fe | ϵ
o B → fg
❑ By seeing only the first input symbol f, you cannot decide whether to
apply the production A → fe or A → ϵ, because f is in both the FIRST set
of A and the FOLLOW set of A (A can be parsed as epsilon and B as f).
❑ Notice that if you have no epsilon-productions you cannot have a

FIRST/FOLLOW conflict.
Top-Down Parsing
LL(1) Grammars
 Explain why the following grammar is LL(1)
X → YaYb|ZbZa
Y →ϵ
Z →ϵ
 Explain why the following grammar is not LL(1)
S → ABA
A →aA|ϵ
B →b|ϵ
Top-Down Parsing
LL(1) Grammars
 Explain why the following grammar is LL(1)
X → YaYb|ZbZa
First(YaYb) = {a} FIRST(YaYb) and
Y →ϵ First(ZbZa) = {b} FIRST(ZbZa) are disjoint,
Z →ϵ the grammar is LL(1).
 Explain why the following grammar is not LL(1)
S → ABA First(aA) = {a}

Follow(A)= Follow(S) and First(BA)
A →aA|ϵ = { $ } and { b a } = { $ b a}
First(aA) and Follow(A) are not disjoint, the grammar
B →b|ϵ is not LL(1).
Top-Down Parsing
LL(1) Grammars
 Predictive parsers can be constructed for LL(1) grammars since the proper
production to apply for a nonterminal can be selected by looking only at the
current input symbol.
 Flow-of-control constructs, with their distinguishing keywords, generally satisfy

the LL(1) constraints. For instance, if we have the productions
then the keywords if, while, and the symbol { tell us which alternative is the only
one that could possibly succeed if we are to find a statement.
Top-Down Parsing
LL(1) Grammars
 The class of grammars for which we can construct predictive parsers

looking k-symbols ahead in the input is sometimes called the LL(k) class.
 The LL(1) class uses FIRST and FOLLOW computations.
▪ From the FIRST and FOLLOW sets for a grammar, we shall construct
Predictive parsing tables.
▪ Predictive parsing tables make the explicit choice of production during

top-down parsing.
 FIRST and FOLLOW sets are also useful during bottom-up parsing.
Top-Down Parsing
LL(1) Grammars

 Example:
S → x | xy | xyz
The above grammar is LL(1) or LL(2) or LL(3) or LL(4).

Top-Down Parsing
LL(1) Grammars

 Example:
S → x | xy | xyz
The above grammar is LL(1) or LL(2) or LL(3) or LL(4).

Top-Down Parsing
Note: This algorithm collects the information from FIRST and FOLLOW sets into a predictive
parsing table M[A,a], a two-dimensional array, where A is a nonterminal, and a is a terminal
or the symbol $, the input end marker.
Top-Down Parsing
FIRST FOLLOW
 Example 1: E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$
Figure 4.17: Parsing table M

Top-Down Parsing
FIRST FOLLOW
 Example 1: E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$
Error Error Error Error

Error Error Error
Error Error
Top-Down Parsing
 Input : id + id * id$
Top-Down Parsing
 Example 2: Construct parsing table for the following grammar
Top-Down Parsing
FIRST FOLLOW
S ia e$
S' eϵ e$
E b t

Top-Down Parsing
 Example 2:
The parsing table in Fig. 4.18. The entry for M[S',e] contains both
S' → eS and S' → ϵ. The grammar is ambiguous.

Top-Down Parsing
FIRST FOLLOW
S → ABCDE S a b cdeϵ $
A → a | ϵ A a ϵ bcde$
B → b | ϵ
B b ϵ cde$
C → c | ϵ
D → d | ϵ C c ϵ de$
E → e | ϵ D d ϵ e$
E e ϵ $
a b c d e $
S
A
B
C
D
E
Top-Down Parsing
FIRST FOLLOW
S → ABCDE S a b cdeϵ $
A → a | ϵ A a ϵ bcde$
B → b | ϵ
B b ϵ cde$
C → c | ϵ
E → e | ϵ D d ϵ e$
E e ϵ $
a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C C → c C → ϵ C → ϵ C → ϵ
D D → d D → ϵ D → ϵ
E E → e E → ϵ
Top-Down Parsing
Exercise
 For the following productions:
S → +SS | * SS | a
 Write predictive parsing table
 Write predictive parser
 Show how to parse: +*aaa
FIRST FOLLOW + * a $
S S
Top-Down Parsing
Exercise
S → +SS | * SS | a
FIRST FOLLOW + * a $
S +*a +*a$ S S → +SS S → *SS S → a
Top-Down Parsing
Exercise
 The following grammar is not LL(1)
S → Aa
A → bA|B
B → Cc
C → bC|ϵ
It is possible to drop exactly one production from this grammar to obtain

a new grammar generating the same language. Identify that production
and prove that the resulting grammar is LL(1).
Top-Down Parsing
Nonrecursive Predictive Parsing
 A nonrecursive predictive parser can be built by maintaining a stack

explicitly, rather than implicitly via recursive calls.
The predictive parser mimics a leftmost derivation.
 If w is the input that has been matched so far, then the stack holds α
sequence of grammar symbols a such that
 The table-driven predictive parser in Fig. 4.19 has

1. an input buffer,
2. a stack containing a sequence of
grammar symbols,
3. a parsing table and
4. an output stream.
Top-Down Parsing
M:
(i.e., not matching with the input symbol)
(i.e., the parsing-table entry is empty)

Top-Down Parsing

Top-Down Parsing

Top-Down Parsing
 Example: On input id + id * id, the nonrecursive predictive parser of Algorithm
4.34 makes the sequence of moves in Fig. 4.21. These moves correspond to a
leftmost derivation:
LL(1) table/Predictive parsing table

Top-Down Parsing
Problem 1:
S → +SS | * SS | a

Top-Down Parsing
Problem 1: S → +SS | * SS | a + * a $
FIRST FOLLOW S
S
Matched Stack Input Action
S$ +*aaa$
Top-Down Parsing
Problem 1: S → +SS | * SS | a + * a $
FIRST FOLLOW S S→+SS S→*SS S→a
S +*a +*a$
S$ +*aaa$
+SS$ +*aaa$ Output: S→ +SS
+ SS$ *aaa$ Match +
+ *SSS$ *aaa$ Output: S→ *SS
+* SSS$ aaa$ Match *
+* aSS$ aaa$ Output: S→ a
+*a SS$ aa$ Match a
+*a aS$ aa$ Output: S→ a
+*aa S$ a$ Match a
+*aa a$ a$ Output: S→ a
+*aaa $ $ Match a and Accept
Top-Down Parsing
Problem 2:
 For the following grammar answer the following :
a) Eliminate Left recursion
b) Left factor the grammar
c) Construct first and follow table.
d) Construct LL(1) table.
e) Mention whether the grammar is in LL(1) or not.
f) If the grammar is in LL(1), parse the string: (a,a)
S → a | ^ | (L)
L → L,S | S
Recursive grammar
S → a | ^ | (L) LL(1) table/Predictive parsing table
FIRST FOLLOW
L → L,S | S a ^ ( ) , $
S a ^ ( $ , )
S S → a S → ^ S → (L)
Non-recursive grammar L a ^ ( )
S → a | ^ | (L) L L → SA L → SA L → SA
A , ϵ )
L → SA
A → ,SA | ϵ A A → ϵ A → ,SA

S$ (a,a)$
Recursive grammar
S → a | ^ | (L) LL(1) table/Predictive parsing table
FIRST FOLLOW
L → L,S | S a ^ ( ) , $
S a ^ ( $ , )
S S → a S → ^ S → (L)
Non-recursive grammar L a ^ ( )
S → a | ^ | (L) L L → SA L → SA L → SA
A , ϵ )
L → SA
A → ,SA | ϵ A A → ϵ A → ,SA

S$ (a,a)$
(L)$ (a,a)$ Output: S→ (L)
( L)$ a,a)$ Match (
( SA)$ a,a)$ Output: L→ SA
( aA)$ a,a)$ Output: S→ a
(a A)$ ,a)$ Match a
(a ,SA)$ ,a)$ Output: A → ,SA
(a, SA)$ a)$ Match ,
(a, aA)$ a)$ Output: S→ a
(a,a A)$ )$ Match a
(a,a )$ )$ Output: A→ ϵ
(a,a) $ $ Match ) and Accept
Top-Down Parsing
Problem 3:
f) If the grammar is in LL(1), parse the string: ba
S → AaAb | BbBa
A → λ
B → λ
Top-Down Parsing
Problem 4:
f) If the grammar is in LL(1), parse the strings: λ and b
S → AB
A → a | λ
B → b | λ
Note: λ is epsilon(i.e., null character)

Top-Down Parsing
Problem 5:
f) If the grammar is in LL(1), parse the strings: abce, cde and empty string
S → ABCDE
A → a | λ
B → b | λ
C → c | λ
D → d | λ
E → e | λ
a b c d e $
S → ABCDE FIRST FOLLOW
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
A → a | ϵ S a b cdeϵ $
B → b | ϵ A a ϵ bcde$ B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C → c | ϵ B b ϵ cde$ C C → c C → ϵ C → ϵ C → ϵ
D D → d D → ϵ D → ϵ
E → e | ϵ D d ϵ e$
E e ϵ $ E E → e E → ϵ

S$ abce$
ABCDE$ abce$ Output: S→ ABCDE
aBCDE$ abce$ Output: A→a
a BCDE$ bce$ Match a
a bCDE$ bce$ Output: B→b
ab CDE$ ce$ Match b
ab cDE$ ce$ Output: C→c
abc DE$ e$ Match c
abc E$ e$ Output: D→ϵ
abc e$ e$ Output: E→e
abce $ $ Match e and Accept
a b c d e $
S → ABCDE FIRST FOLLOW
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
A → a | ϵ S a b cdeϵ $
B → b | ϵ A a ϵ bcde$ B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C → c | ϵ B b ϵ cde$ C C → c C → ϵ C → ϵ C → ϵ
D D → d D → ϵ D → ϵ
E → e | ϵ D d ϵ e$
E e ϵ $ E E → e E → ϵ
Input: Empty String/Null String

S$ $
ABCDE$ $ Output: S→ ABCDE
BCDE$ $ Output: A→ ϵ
CDE$ $ Output: B→ ϵ
DE$ $ Output: C→ ϵ
E$ $ Output: D→ ϵ
$ $ Output: E→ ϵ and Accept
Top-Down Parsing
Error Recovery in Predictive Parsing
 An error is detected during predictive parsing when
1. the terminal on top of the stack does not match the next input symbol or
2. nonterminal A is on top of the stack, a is the next input symbol, and

M[A,a] is error (i.e., the parsing-table entry is empty)
Fig 4.20: Predictive Parsing Algorithm
.....
Top-Down Parsing
 We would like our parser to be able to recover from an error and

continue parsing.
1. Panic mode recovery
▪ Modify the stack and/or the input string to try and reach state
from which we can continue.
2. Pharse-level recovery
▪ We associate each empty slot with an error handling procedure.

Panic mode recovery
 Idea:
➢ Decide on a set of synchronizing tokens.
➢ When an error is found and there's a nonterminal at the top of the stack,
• discard input tokens until a synchronizing token is found.
• Synchronizing tokens are chosen so that the parser can recover

quickly after one is found, e.g. a semicolon when parsing statements.
➢ When an error is found and there is a terminal at the top of the stack,
• we could try popping it to see whether we can continue. Assume that

the input string is actually missing that terminal.
Panic mode recovery
 Possible synchronizing tokens for a nonterminal A

➢ the tokens in FOLLOW(A)
▪ When one is found, pop A of the stack and try to continue
➢ the tokens in FIRST(A)
▪ When one is found, match it and try to continue
➢ tokens such as semicolons that terminate statements

Panic mode recovery
 HOW TO SELECT SYNCHRONIZING SET?

➢ Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A.
If we skip input symbols until an element of FOLLOW(A) is seen in input and pop
A from the stack, then it is likely that parsing can continue.
➢ Place all symbols in FIRST(A) to the synchronizing set for nonterminal A.

If a symbol in FIRST(A) appears in the input, then it may be possible to resume
parsing according to A.
➢ We might add keywords that begins statements to the synchronizing sets for the
nonterminals generating expressions.
➢ We can add to the synchronizing set of a lower-level construct the symbols that
begin higher-level constructs.
Panic mode recovery
 HOW TO SELECT SYNCHRONIZING SET? Cont..
➢ If a terminal on top of stack cannot be matched, a simple idea is to

pop the terminal, issue a message saying that the terminal was inserted
and continue parsing.
➢ If a nonterminal can generate the empty string, then the production

deriving ϵ can be used as a default.
This may postpone some error detection, but cannot cause an error to
be missed. This approach reduces the number of nonterminals that have
to be considered during error recovery.
FIRST FOLLOW
Panic mode recovery E
E'
( id
+ ϵ
)$
)$
T ( id +)$
T' * ϵ +)$
 Example: The table in Fig. 4.22 is to be used as follows. F ( id *+)$
▪ If the parser looks up entry M[A, a] and finds

▪ the entry is blank, then the input symbol a is skipped.
▪ the entry is "synch," then the nonterminal on top of the stack is popped in an
attempt to resume parsing.
▪ If a token on top of the stack does not match the input symbol, then we pop
the token from the stack, as mentioned above.
“synch” indicating synchronizing

tokens obtained from FOLLOW set of
the nonterminal in question.
FIRST FOLLOW
E ( id )$
E' + ϵ )$
Panic mode recovery T
T'
( id
* ϵ
+)$
+)$
F ( id *+)$
 On the erroneous input + id * +id, the parser and error recovery mechanism
of Fig. 4.22 behave as in Fig. 4.23.
▪If the parser looks up entry M[A, a] and finds

▪the entry is blank, then the input symbol a
is skipped.
▪the entry is "synch," then the nonterminal
on top of the stack is popped in an attempt
to resume parsing.
Pharse level error recovery

 Each unfilled cell in the table can be filled with a special-purpose
error routine.
 Error routines typically remove tokens from the input, and/or pop an
item from the stack.
 It is ill-advised to modify the input stream or the stack without

removing items, because it is then hard to guarantee that error
recovery will always terminate.
Parsing
Bottom-Up Parsing
Bottom-Up Parsing
Automaton (DFA) is used to make

Nonbacktracking parsing decisions in LR parsers.
shift-reduce parsers
Table driven
Least Powerful Most Powerful
A bottom-up parse corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top).
Bottom-Up Parsing
 A general style of bottom-up parsing is known as shift-reduce parsing.
 The LR grammars - the largest class of grammars for which shift-reduce

parsers can be built.
Bottom-Up Parsing
 It is too much work to build an LR parser by hand, automatic parser

generators tools make it easy to construct efficient LR parsers from
suitable grammars.
 Parser Generators Tools:

▪ Antlr is a widely-used parser generator for Java, and other languages.
▪ Yacc/Bison – Parser generator
Bison(part of GNU Project) reads a specification of a context-free

language, warns about any parsing ambiguities, and generates a
parser (either in C, C++, or Java) which reads sequences of tokens and
decides whether the sequence conforms to the syntax specified by the
grammar.
Bottom-Up Parsing
 Given G: and the input string: id * id
Right-most derivation:
Figure 4.25: A bottom-up parse for id * id

Bottom-Up Parsing
 Given G: and the input string: id * id
Figure above illustrate a sequence of reductions.

Bottom-Up Parsing
Reductions
 A bottom-up parsing is the process of "reducing" a string w to the start

symbol of the grammar.
At each reduction step, a specific substring matching the body of a

production is replaced by the nonterminal at the head of that
production.
 The key decisions during bottom-up parsing are about when to reduce
and about what production to apply, as the parse proceeds.
Bottom-Up Parsing
Reductions
 Example: The reductions will be discussed in terms of the sequence of strings
id * id, F * id, T * id, T * F, T, E

➢ The strings in this sequence are formed from the roots of all the subtrees in the
snapshots.
➢ The sequence starts with the input string id*id.
➢ The first reduction produces F * id by reducing the leftmost id to F, using the

production F → id.
➢ The second reduction produces T * id by reducing F to T.
➢ Now, we have a choice between reducing the string T, which is the body of E →
T, and the string consisting of the second id, which is the body of F → id. Rather
than reduce T to E, the second id is reduced to T, resulting in the string T * F. This
string then reduces to T. The parse completes with the reduction of T to the start
symbol E.
Bottom-Up Parsing
Reductions
 By definition, a reduction is the reverse of a step in a derivation (recall

that in a derivation, a nonterminal in a sentential form is replaced by the body of one of its productions).
 The goal of bottom-up parsing is therefore to construct a derivation

in reverse.
---------------------------------------→
The rightmost derivation in reverse
Bottom-Up Parsing
Handle Pruning
 Bottom-up parsing during a left-to-right scan of the input constructs a

rightmost derivation in reverse.
 A "handle" is a substring that matches the body of a production, and

whose reduction represents one step along the reverse of a rightmost
derivation.
 Handle Pruning: replace handle by corresponding LHS of a production.

Bottom-Up Parsing
Handle Pruning
 Handle Pruning: replace handle by corresponding LHS.
Bottom-Up Parsing
Shift-Reduce Parsing
Bottom-Up Parsing: Shift-Reduce Parsing
Implementing Shift-Reduce Parsers
➢ In Shift-reduce parsing
▪ A Stack holds grammar symbols.
▪ An input buffer holds the rest of the string to be parsed.
▪ The handle always appears at the top of the stack just before it is
identified as the handle.
▪ $ - mark the bottom of the stack and also the right end of the input.
➢ Shift-reduce parsing - Demo

➢ https://silcnitc.github.io/yacc.html
 Initially, the stack is empty, and the string w is on the input, as

follows:
 Parse is successful if stack contains only the start symbol when the
input stream ends.
Four possible actions a shift-reduce parser can make:
1. Shift. Shift the next input symbol onto the top of the stack.
2. Reduce. The right end of the string to be reduced must be at the

top of the stack. Locate the left end of the string within the stack
and decide with what nonterminal to replace the string.
3. Accept. Announce successful completion of parsing.
4. Error. Discover a syntax error and call an error recovery routine.
The use of a stack in shift-reduce parsing is justified by an important fact:

the handle will always eventually appear on top of the stack, never inside.
 Shift-reduce parser actions in parsing the input string id1 * id2
1. Shift. Shift the next input symbol onto the top of the stack.
2. Reduce. The right end of the string to be reduced must be at the top
of the stack. Locate the left end of the string within the stack and
decide with what nonterminal to replace the string.
3. Accept. Announce successful completion of parsing.
 For the following grammar, parse the string a+++a++ using general style of
bottom up parsing (i.e., shift reduce parsing).
S → A
A → A+A | B++
B → a
 For the following grammar, parse the string a+++a++ using general style of
bottom up parsing (i.e., shift reduce parsing).
S → A
STACK INPUT ACTION
A → A+A | B++
$ a+++a++$ shift B → a
$a +++a++$ Reduce by B → a
$B +++a++$ shift
$B+ ++a++$ shift
$B++ +a++$ Reduce by A → B++
$A +a++$ shift
$A+ a++$ shift
$A+a ++$ Reduce by B → a
$A+B ++$ shift
$A+B+ +$ shift
$A+B++ $ Reduce by A → B++
$A+A $ Reduce by A → A+A
$A $ Reduce by S → A
Conflicts During Shift-Reduce Parsing
 There are (ambiguous)CFGs for which shift-reduce parsing cannot be used.
▪ Every shift-reduce parser for such a grammar can reach a configuration

in which the parser, knowing the entire stack contents and the next input
symbol,
▪ Cannot decide whether to shift or to reduce (a shift/reduce conflict),or
▪ Cannot decide which of several reductions to make (a reduce/reduce

conflict).
▪ Technically, these CFGs are not in the LR(k) class of grammars; we refer
to them as non-LR grammars.
 Note:
 Ambiguous CFGs and getting conflicts during parsing,

then they are not under LL(K) and LR(K) class of
grammars.
 Unambiguous CFGs and getting conflicts during

parsing, then the Parser is not capable of handling
those CFGs.

Bottom-Up Parsing
LR Parsing
LR Parsing
 The most prevalent type of bottom-up parser today is based

on a concept called LR(k) parsing;
▪ the "L" is for left-to-right scanning of the input,
▪ the "R" for constructing a rightmost derivation in reverse, and
▪ the k for the number of input symbols of lookahead that are used
in making parsing decisions.
 The cases k = 0 or k = 1 are of practical interest, and we only

consider LR parsers with k <= 1 here. When (k) is omitted, k is
assumed to be 1.
LR Parsing
 This section introduces the basic concepts of LR parsing

and the easiest method for constructing shift-reduce
parsers, called "simple LR" (or SLR).
 LR parsers are table-driven.

LR Parsing
Why LR Parsers?
 LR parsing is attractive for a variety of reasons:
1. LR parsers can be constructed to recognize virtually all programming
language constructs for which context-free grammars can be written.
2. The LR-parsing method is the most general nonbacktracking shift-reduce

parsing method known, yet it can be implemented as efficiently as
other, more primitive shift-reduce methods.
3. An LR parser can detect a syntactic error as soon as it is possible to do

so on a left-to-right scan of the input.
LR Parsing
Why LR Parsers?
 LR parsing is attractive for a variety of reasons:

4. The class of grammars that can be parsed using LR methods is a proper
superset of the class of grammars that can be parsed with predictive or
LL methods.
Thus, LR grammars can describe more languages than LL grammars.

LR Parsing
Items and the LR(0) Automaton
 How does a shift-reduce parser know when to shift and when to reduce?
For example, with stack contents $T and next input symbol * in Fig. 4.28,
how does the parser know that T on the top of the stack is not a handle, so
the appropriate action is to shift and not to reduce T to E?
LR Parsing
 How does a shift-reduce LR-parser know when to shift and when to reduce?
▪ An LR parser makes shift-reduce decisions by using automaton
(maintaining states) to keep track of where we are in a parse.
▪ States represent sets of "items“.
DFA is used to make parsing

decisions in LR parsers.
LR Parsing
 What is an LR(0) item?
▪ An LR(0) item is a production with a dot at

some position of the body. Thus,
production A →XYZ yields the four items
▪ A → •XYZ
▪ A → X•YZ
▪ A → XY•Z
▪ A → XYZ•
▪ The production A → ϵ generates only one

item, A → •
LR Parsing
 An item indicates how much of a production body we have seen at a given

point in the parsing process.
 For example
➢ Item A → •XYZ indicates that we hope to see next an input string derivable from
XYZ.
➢ Item A → X•YZ indicates that we have just seen on the input a string derivable
from X and that we hope next to see a string derivable from YZ.
➢ Item A → XYZ• indicates that we have seen a string derivable from XYZ on the
input and that it may be time to reduce XYZ to A.
 One collection of sets of LR(0) items, called the canonical LR(0) collection,
provides the basis for constructing a DFA that is used to make parsing
decisions. Such an automaton is called an LR(0) automaton.
LR Parsing
 Each state of the LR(0)

automaton represents a set of
items in the canonical LR(0)
collection.
LR Parsing
 To construct the Canonical LR(0) collection (or LR(0) Automaton) for

a grammar, we define
▪ An Augmented grammar and
▪ two functions, CLOSURE and GOTO.

LR Parsing
Augmented grammar
▪ If G is a grammar with start symbol S, then G', the augmented grammar

for G, is G with a new start symbol S' and production S' →S.
G' = G + { new start symbol S' and production S' →S}
▪ The purpose of this new starting production is to indicate to the parser

when it should stop parsing and announce acceptance of the input.
That is, acceptance occurs only when the parser is about to reduce by
S' → S
▪ Example:
G’ :
G:
LR Parsing
Closure of Item Sets
➢ If I is a set of items for a grammar G, then CLOSURE(I) is the set of items

constructed from I by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A → α•Bβ is in CLOSURE(I) and B → γ is a production, then add the

item B →•γ to CLOSURE(I), if it is not already there.
Apply this rule until no more new items can be added to CLOSURE(I).
 Examples: Find CLOSURE(I6)

+
(
CLOSURE(I4) Find CLOSURE(I2)

*
LR Parsing
Closure of Item Sets cont…
 Intuitively, A → α•Bβ in CLOSURE(I) indicates that

▪ At some point in the parsing process, we think we might next see a
substring derivable from Bβ as input.
▪ The substring derivable from Bβ will have a prefix derivable from B by

applying one of the B-productions.
We therefore add items for all the B-productions; that is, if B →γ is a
production, we also include B →•γ in CLOSURE(I).
 Examples:
LR Parsing
 Closure algorithm
LR Parsing
 Example 4.40 : Construct the Closure of Item Sets for the following
augmented grammar
LR Parsing
LR Parsing
The Function GOTO
▪ The second useful function is GOTO(I,X) where I is a set of items and X is a
grammar symbol.
▪ The GOTO function is used to define the transitions in the LR(0) automaton for
a grammar.
▪ GOTO(I,X) is defined to be the closure of the set of all items [A → αX•β] such
that [A → α•Xβ] is in I.
Example: GOTO(I1,+)
▪ The states of the automaton correspond to sets of items, and GOTO(I,X)

specifies the transition from the state for I under input X.
LR Parsing
The Function GOTO
1
1
LR Parsing
 Algorithm to construct C, the canonical collection of sets of LR(0)
items for an augmented grammar G'
LR Parsing
 Exercise 1: Construct the LR(0) automaton for the following grammar.

LR Parsing
E' → E
▪Nonkernel items
▪no need to be stored
▪dot at far left
LR Parsing
FIRST FOLLOW
S
LR Parsing
LR Parsing
The LR-Parsing
 LR parser consists of
1. an input,
2. an output,
3. a stack,
4. a driver program, and
5. a parsing table that has two parts (ACTION and GOTO).
 The parsing program reads characters from an input buffer one at a time.
Where a shift-reduce parser would shift a symbol, an LR parser shifts a state.
Each state summarizes the information contained in the stack below it.
 The LR Parser stack holds a sequence of states, s0s1 . . . sm where sm is on top.

LR Parsing
Simple-LR Parsing/SLR Parsing/

SLR(1) Parsing
SLR Parser
Constructing SLR-Parsing Tables
 The SLR method for constructing parsing tables is a good starting

point for studying LR parsing.
 We shall refer
▪ to the parsing table constructed by SLR method as an SLR table,
and
▪ to an LR parser using an SLR-parsing table as an SLR parser.
 The SLR method begins with LR(0) items and LR(0) automata.
That is, given a grammar, G, we augment G to produce G', with a new start symbol
S'. From G', we construct C, the canonical collection of sets of items for G' together
with the GOTO function.
SLR Parser
E' → E
SLR Parser
SLR Parser
1. ...
2. ...
SLR Parser
 The parsing table consisting of the ACTION and GOTO functions

determined by Algorithm 4.46 is called the SLR(1) table for G.
 An LR parser using the SLR(1) table for G is called the SLR(1) parser for G,
and a grammar having an SLR(1) parsing table is said to be SLR(1).
 We usually omit the "(1)" after the "SLR," since we shall not deal here
with parsers having more than one symbol of lookahead.
 In SLR parsers, the lookahead sets are determined directly from the
grammar, without considering the individual states and transitions.
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
s5 state number r3 Production number
Constructing SLR-Parsing Table
shift reduce
 Example 4.47 : Let us construct the SLR table for the augmented expression grammar.
FOLLOW
E +)$
T +*)$
F +*)$
Means shift and move to state 4
Means reduce using production number 1

SLR Parser
 Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar.
S -> SA
S -> A
A -> a
▪ Construct the SLR parsing table for this grammar.
▪ Is the grammar SLR?

LR-Parsing Algorithm (Method-2)
LR-parsing algorithm. (Method-2)
INPUT: An input string w and an LR-parsing table with functions ACTION and GOTO for a grammar G.
OUTPUT: If w is in L(G), the reduction steps of a bottom-up parse for w; otherwise, an error indication.
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer.
let a be the first symbol of w$;
while(1) {
let s be the state on top of the stack;
if (ACTION[s, a] = shift t) {
push a and then t onto the stack;
let a be the next input symbol;
} else if (ACTION[s, a] = reduce A → β) {
pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and then GOTO[t, A] onto the stack;
} else if (ACTION[s, a] = accept) break; /* parsing is done */
else call error-recovery routine;
SLR Parsing (Method-2) Input: id+id$
Stack Input Action Fig: SLR Parsing table
buffer
$0 id+id$
LR Parsing Algorithm (Method-2)

let a be the first symbol of w$;
while(1) {
let s be the state on top of the stack;
if (ACTION[s, a] = shift t) {
push a and then t onto the stack;
let a be the next input symbol;
} else if (ACTION[s, a] = reduce A → β) {
pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and then GOTO[t, A] onto the
stack;
} else if (ACTION[s, a] = accept) break;
/* parsing is done */
}
SLR Parsing (Method-2) Input: id+id$
Stack Input Action Fig: SLR Parsing table
buffer
$0 id+id$ ACTION[0,id] = s5, Push current input symbol id
and then state number 5 onto the stack
$ 0 id 5 +id$ ACTION[5,+] = r6 (F → id), Pop 2*|id| symbols
from the stack, Push F and then GOTO[0,F]=3 onto
the stack.
$0F3 +id$ ACTION[3,+] = r4 (T → F), Pop 2*|F| symbols from
the stack, Push T and then GOTO[0,T]=2 onto the
stack.
$0T2 +id$ ACTION[2,+] = r2 (E → T), Pop 2*|T| symbols from
the stack, Push E and then GOTO[0,E]=1 onto the
stack. LR Parsing Algorithm (Method-2)
$0E1 +id$ ACTION[1,+] = s6, Push + and then 6 onto the let a be the first symbol of w$;
stack while(1) {
$0E1+6 id$ ACTION[6,id] = s5, Push id and then 5 onto the let s be the state on top of the stack;
stack if (ACTION[s, a] = shift t) {
$ 0 E 1 + 6 id 5 $ ACTION[5,$] = r6 (F → id), Pop 2*|id| symbols push a and then t onto the stack;
from the stack, Push F and then GOTO[6,F]=3 onto let a be the next input symbol;
the stack. } else if (ACTION[s, a] = reduce A → β) {
$0E1+6F3 $ ACTION[3,$] = r4 (T → F), Pop 2*|T| symbols pop 2*|β| symbols off the stack;
from the stack, Push T and then GOTO[6,T]=9 onto let state t now be on top of the stack;
the stack. push A and then GOTO[t, A] onto the
stack;
$0E1+6T9 $ ACTION[9,$] = r1 (E → E+T), Pop 2*|E+T| symbols
} else if (ACTION[s, a] = accept) break;
from the stack, Push E and then GOTO[0,E]=1 /* parsing is done */
onto the stack.
$0E1 $ ACTION[1,$] = Accept }
SLR Parser
grammar.
 Show the parsing table for this grammar. Is the grammar SLR?
 Show the actions of your parsing table from Exercise 4.6.2 on the input
aa*a+.
SLR Parser
grammar. S → SS+ | SS* | a First Follow
Accept S a a+*$
I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*
S
Fig: SLR Parsing table
a ACTION GOTO
I2 a a + * $ S
a
0
1
Fig: LR(0) Automata 2
3
Show the parsing table for this grammar. Is the grammar SLR? 4
Yes, the grammar is SLR, because no conflicts in SLR table, 5
SLR Parser
grammar. S → SS+ | SS* | a First Follow
Accept S a a+*$
I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*
S
a ACTION GOTO
I2 a a + * $ S
a
0 s2 1
1 s2 Acc 3
Fig: LR(0) Automata 2 r3 r3 r3 r3
3 s2 s4 s5 3
Show the parsing table for this grammar. Is the grammar SLR? 4 r1 r1 r1 r1
Yes, the grammar is SLR, because no conflicts in SLR table, 5 r2 r2 r2 r2
SLR Parsing (Method-2)
ACTION GOTO
 Exercise 4.6.2 : S -> SS+ | SS* | a
a + * $ S
 Show the actions of your parsing table on the input aa+.
0 s2 1
(1) S -> SS+ 1 s2 Acc 3

(2) S -> SS* 2 r3 r3 r3 r3
(3) S -> a
3 s2 s4 s5 3
4 r1 r1 r1 r1
5 r2 r2 r2 r2
Stack Input Action

buffer
$0 aa+$ ACTION[0,a] = s2, Push a and then 2 onto the stack
SLR Parsing (Method-2)
ACTION GOTO
 Exercise 4.6.2 : S -> SS+ | SS* | a
a + * $ S
 Show the actions of your parsing table on the input aa+.
0 s2 1
(1) S -> SS+ 1 s2 Acc 3

(2) S -> SS* 2 r3 r3 r3 r3
(3) S -> a
3 s2 s4 s5 3
4 r1 r1 r1 r1
5 r2 r2 r2 r2
Stack Input Action

buffer
$0 aa+$ ACTION[0,a] = s2, Push a and then 2 onto the stack
$0a2 a+$ ACTION[2,a] = r3 (S → a), Pop 2*|a| symbols from the stack,
Push S and then GOTO[0,S]=1 onto the stack.
$0S1 a+$ ACTION[1,a] = s2, Push a and then 2 onto the stack
$0S1a2 +$ ACTION[2,+] = r3 (S → a), Pop 2*|a| symbols from the stack,

$0S1S3 +$ ACTION[3,+] = s4, Push + and then 4 onto the stack
$0S1S3+4 $ ACTION[4,$] = r1 (S → SS+), Pop 2*|SS+| symbols from the stack,

$0S1 $ ACTION[1,$] = Accept
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented)
grammar (4.49).
Given example defines a small grammar for assignment
S -> L=R | R statements, Think of L and R as standing for l-value and r-
value, respectively, and * as an operator indicating
L -> *R | id
"contents of."
R -> L
▪ Show the parsing table for this grammar. Is the grammar SLR?
▪ Show the actions of your parsing on the input id=id.
Note: Every SLR(1) grammar is unambiguous, but there are many

unambiguous grammars that are not SLR(1).
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented) grammar (4.49).
S -> L=R | R
L -> *R | id
R -> L
Fig: LR(0) Automata

SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented) grammar (4.49).
S -> L=R | R
L -> *R | id
R -> L
Fig: LR(0) Automata Fig: SLR Parsing table

SLR Parser
SLR Parser
grammar .
1) S -> AaAb
2) S -> BbBa
3) A -> ε
4) B -> ε
SLR Parser
grammar .
1) S -> AaAb First Follow
2) S -> BbBa S ab $
A ε ab
3) A -> ε
B ε ab
4) B -> ε
I0
ACTION GOTO
S I1
a b $ S A B
A I2 0 r3 r4 r3 r4 1 2 3
…
B I3 …
r-r conflicts in SLR table, hence the grammar is not SLR.

SLR Parser
SLR Parsing
 Exercise 4.6.4 : For each of the (augmented) grammars
➢ Construct the SLR sets of items and their GOTO function.

➢ Indicate any action conflicts in your sets of items.
➢ Construct the SLR-parsing table, if one exists.
SLR Parsing
LR Parsing
LR(0) Parsing
LR(0) Parsing
 An LR parser using an LR(0)-parsing table is an LR(0) parser.
 The LR(0) method begins with LR(0) items and LR(0) automata. That is, given
a grammar, G, we augment G to produce G', with a new start symbol S'.
From G', we construct C, the canonical collection of sets of items for G'
together with the GOTO function.
 LR(0) parsing does not care about next input symbol. It does not use
lookahead.
 During LR(0) parsing table construction, whenever there is a final item in

LR(0) automaton, we put the reduce move in the entire row corresponding
to the state that contains the final item.
LR(0) Parsing
 LR(0) is the simplest technique in the LR family. Although that makes it
the easiest to learn, these parsers are too weak to be of practical use
for anything but a very limited set of grammars.
 The fundamental limitation of LR(0) is the zero, meaning no lookahead

tokens are used. It is a stifling constraint to have to make decisions
using only what has already been read, without even glancing at what
comes next in the input.
 If we could peek at the next token and use that as part of the decision
making, we will find that it allows for a much larger class of grammars to
be parsed.
LR(0) Parsing
Is the following grammar in LR(0)?
S→A
S→ a
A→a
LR(0) Parsing
E→T+E
E→T
T → id
LR(0) Parsing
1) S → A a A b
2) S → B b B a First Follow
S ab $
3) A → ε A ε ab
4) B → ε B ε ab
Fig: LR(0) Parsing table
I0
ACTION GOTO
S I1 a b $ S A B
0 r3 r4 r3 r4 r3 r4 1 2 3
A I2
…
B I3 …
r-r conflicts in LR(0) table, hence the grammar is not LR(0).

LR(0) Parsing
Advantage of SLR over LR(0)
 The simple improvement that SLR(1) makes on the basic LR(0) parser
is to reduce only if the next input token is a member of the follow set
of the non-terminal being reduced.
 When filling in the table, we don't assume a reduce on all inputs as

we did in LR(0), we selectively choose the reduction only when the
next input symbols in a member of the follow set.
LR(0) Parsing
Comparison between LR(0) and SLR
❑ Similarities between LR(0) and SLR parsers:

❑ Both use LR(0) automata
❑ Same way to fill the Goto part for the shift move in the parsing table. SLR
❑ The only difference between the two: LR(0)
❑ Where to place the reduce move
❑ In SLR, the addition of just one token of lookahead and use of the follow set
greatly expands the class of grammars that can be parsed without conflict.
❑ If the grammar is in LR(0), it is definitely in SLR and if it is in SLR, it may or may

not be in LR(0).
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes
 Why can LR(0) automata be used to make shift-reduce decisions?
➢ The LR(0) automaton for a grammar characterizes the strings of grammar

symbols that can appear on the stack of a shift-reduce parser for the
grammar.
➢ The stack contents must be a prefix of a right-sentential form. If the stack

holds α and the rest of the input is x, then a sequence of reductions will take
αx to S. In terms of derivations,
➢ Not all prefixes of right-sentential forms can appear on the stack, however,
since the parser must not shift past the handle. For example, suppose
LR Parsing
Viable Prefixes
➢ Then, at various times during the parse, the stack will hold (, (E, and (E), but it
must not hold (E)*, since (E) is a handle, which the parser must reduce to F
before shifting *.
➢ The prefixes of right sentential forms that can appear on the stack of a shift-
reduce parser are called viable prefixes.
➢ Viable prefixes are defined as follows: a viable prefix is a prefix of a right-

sentential form that does not continue past the right end of the rightmost handle
of that sentential form.
By this definition, it is always possible to add terminal symbols to the end of a
viable prefix to obtain a right-sentential form.
LR Parsing
Viable Prefixes
➢ The prefixes of right sentential form that can appear on the stack of a shift-
reduce parser are called viable prefixes.
We observe that at any point of time, the

stack contents must be a prefix of a right
sentential form.
However, not all prefixes of a right
sentential form can appear on the stack.
Here, ‘id *’ is a prefix of a right sentential
form. But it can never appear on the
stack! This is because we will always
reduce by F -> id before shifting ‘*’
LR Parsing
Importance of Viable Prefixes:
❑ The entire SLR parsing algorithm is based on the idea that the LR(0)
automaton can recognize viable prefixes and reduce them appropriately.
❑ E ⇒ E+T ⇒ E+F ⇒ E+id ⇒ T+id ⇒ T*F+id ⇒ T*id+id ⇒ F*id+id ⇒ id*id+id

LR Parsing
Viable Prefixes: Valid Items:
LR Parsing
Viable Prefixes: Valid Items:
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes
 We can easily compute the set of valid items for each viable prefix that
can appear on the stack of an LR parser.
 In fact, it is a central theorem of LR-parsing theory that the set of valid items
for a viable prefix γ is exactly the set of items reached from the initial state
along the path labeled γ in the LR(0) automaton for the grammar.
 In essence, the set of valid items embodies all the useful information that
can be gleaned from the stack.
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes
 Exercise 4.6.1 : Describe all the viable prefixes for the following
grammars:
More Powerful LR Parsers
 Next, we will extend the previous LR parsing techniques to use one
symbol of lookahead on the input.
 The “Canonical-LR(CLR)" or CLR(1) method or "LR" method or LR(1)

method
▪ The CLR/LR method makes full use of the lookahead

symbol(s).
▪ This method uses a large set of items, called the LR(1) items.
LR Parsing
Canonical LR/CLR Parsing

CLR / CLR(1) / LR(1) / LR
CLR Parser
Canonical LR(1) Items
S -> L=R | R Shift Reduce Conflict

L -> *R | id
R -> L
 It is possible to carry more information in the state that will allow us to rule
out some of these invalid reductions by A → α. E.g. [A→α•, a],
By splitting states when necessary, we can arrange to have each state of
an LR parser indicate exactly which input symbols can follow a handle α for
which there is a possible reduction to A.
CLR Parser
Canonical LR(1) Items
 The extra information is incorporated into the state by redefining items to

include a terminal symbol as a second component.
 The general form of an item becomes [A→α•β, a], where A→αβ is a production
and a is a terminal or the right endmarker $. We call such an object an LR(1) item.
LR(1) item = LR(0) item + one symbol of lookahead
 The 1 in LR(1) refers to the length of the second component, called

the lookahead of the item.
 The lookahead has no effect in an item of the form [A→α•β, a], where , β is
not ϵ, but an item of the form [A→α•, a], calls for a reduction by A→α only if
the next input symbol is a.
CLR Parser
Constructing LR(1) Sets of Items
 Procedures CLOSURE and GOTO for building the collection of sets of

valid LR(1) items
CLR Parser
 Procedure to construct Sets-of-LR(1)-items for grammar G'

CLR Parser
 Example 4.54 : Consider the following augmented grammar.
Construct the sets of LR(1) items.

CLR Parser
First Follow
S c d $
C c d c d $
The item: C –> d•, c/d says that it is

valid to reduce d to C only if the next input
symbol is equal to c or d.
The lookahead will always be a subset of

Follow(C).
CLR Parser
Canonical LR(1) Parsing Tables
CLR Parser
 Example 4.57: Construct the canonical parsing table for the

following grammar.
CLR Parser
❑ Canonical parsing table for grammar (4.55)
Figure 4.42: Canonical parsing table for grammar(4.55)

CLR Parser
❑ Canonical parsing table for grammar (4.55)
Figure 4.42: Canonical parsing table for grammar(4.55)

CLR Parser
 Exercise 4.7.1: Construct the canonical LR sets of items for the

grammar
CLR Parser S
First
a
Follow
a+*$
 Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
ACTION GOTO
S
a + * $ S
0
1
Fig: LR(1) Automata
2
3
4
5
6
7
8 Fig: CLR Parsing table
9
CLR Parser S
First
a
Follow
a+*$
 Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
ACTION GOTO
S
a + * $ S
0 s2 1
1 s4 Acc 3
Fig: LR(1) Automata
2 r3 r3
3 s4 s5 s6 7
4 r3 r3 r3
5 r1 r1
6 r2 r2
7 s4 s8 S9 7
8 r1 r1 r1 Fig: CLR Parsing table
9 r2 r2 r2
CLR Parser
 Exercise : Construct the canonical LR sets of items (or LR(1) sets of

items) for the grammar S → AA
A → aA|b
CLR Parser

items) for the grammar S → aSb|ab
CLR Parser

items) for the grammar S → aSb|ab
CLR Parser

items) for the grammar FIRST FOLLOW
S * id $
S -> L=R | R R * id = $
L -> *R | id L * id = $
R -> L
FIRST FOLLOW
CLR Parser S
R
L
* id
* id
* id
$
= $
= $
 Exercise: Construct the canonical LR sets of items for S -> L=R | R

Accept L -> *R | id
I1 $ Fig: LR(1) Automata R -> L
I0 (1) S -> L=R
S I6 I9 (2) S -> R
R (3) L -> *R
I2
L = (4) L -> id
L I10 (5) R -> L
R I3 *
id
I4 I7 I11 L
R I13
* R
I2 I8
L *
*
id I5 id I12 id
Fig: CLR/LR Parsing table
ACTION GOTO = * id $ S L R
= * id $ S L R 7 r3 r3
0 s4 s5 1 2 3 8 r5 r5
1 Acc 9 r1
2 s6 r5 10 r5
3 r2
11 s11 s12 10 13
4 s4 s5 8 7
12 r4
5 r4 r4
13 r3
6 s11 s12 10 9
FIRST FOLLOW
CLR Parser S
R
L
* id
* id
* id
$
= $
= $
S -> L=R | R (1) S -> L=R

 Exercise: Construct the canonical LR sets of items for L -> *R | id (2) S -> R
(3) L -> *R
R -> L
▪ Show the actions of your parsing on the input id=id. (4) L -> id
(5) R -> L
Stack Input Action ACTION GOTO
$0 id=id$ ACTION[0,id] = s5 = * id $ S L R
$ 0 id 5 =id$ ACTION[5,=] = r4 (L → id) 0 s4 s5 1 2 3
$0L2 =id$ ACTION[2,=] = s6 1 Acc
$0L2=6 id$ ACTION[6,id] = s12
2 s6 r5
$ 0 L 2 = 6 id 12 $ ACTION[12,$] = r4 (L → id)
3 r2
$ 0 L 2 = 6 L 10 $ ACTION[10,$] = r5 (R -> L)
4 s4 s5 8 7
$0L2=6R9 $ ACTION[9,$] = r1 (S -> L=R)
5 r4 Fig: CLR/LR
r4 Parsing table
$0S1 $ ACTION[1,$] = Accept 6 = s11
* s12
id $ S 10
L R9
7 r3 r3
8 r5 r5
9 r1
10 r5
11 s11 s12 10 13
12 r4
13 r3
CLR Parser
 Canonical LR (CLR/CLR(1)/LR(1)/LR) grammars

▪ Every SLR(1) grammar is a Canonical LR(1) grammar, but the
canonical LR(1) parser may have more states than the SLR(1)
parser.
▪ An LR(1) grammar is not necessarily SLR(1), the grammar given

below is an example. Because an LR(1) parser splits states based
on differing lookaheads, it may avoid conflicts that would
otherwise result if using the full follow set.
S -> L=R | R
L -> *R | id LR(1), not SLR(1)
R -> L
LR Parsing
LALR Parsing
LALR Parser
 The "lookahead-LR" or "LALR" or "LALR(1)" method
❑ The LALR method has many fewer states than typical parsers based
on the LR(1) items.
❑ By carefully introducing lookaheads into the LR(0) items, we can
▪ handle many more grammars with the LALR method than with the
SLR method, and
▪ build parsing tables that are no bigger than the SLR tables.
❑ LALR is the method of choice in most situations.

LALR Parser
Constructing LALR Parsing Tables
 LALR method is often used in practice, because the tables obtained

by it are considerably smaller than the canonical LR tables.
 The most common syntactic constructs of programming languages

can be expressed conveniently by an LALR grammar.
 The same is almost true for SLR grammars, but there are a few
constructs that cannot be conveniently handled by SLR techniques
(see Example 4.48, for example).
LALR Parser
 For a comparison of parser size, the SLR and LALR tables for a
grammar always have the same number of states.
 For a language like C:

▪ The SLR and LALR tables for a grammar typically have several hundred
states.
▪ The CLR table would typically have several thousand states for the same-
size language.
▪ Thus, it is much easier and more economical to construct SLR and LALR
tables than the canonical LR tables.
LALR Parser
LALR Parser
 The table produced by Algorithm 4.59 is called the LALR parsing

table for G.
 If there are no parsing action conflicts, then the given grammar is

said to be an LALR(1) grammar.
 The collection of sets of items constructed in step (3) is called the

LALR(1) collection.
LALR Parser
 Example 4.60 : Again, consider grammar (4.55). there are three pairs of sets
of items that can be merged.
Fig: LALR/LALR(1) Automata
Fig: LR(1) Automata I0 I1
S $ Accept
I2 I5
C
c d
d I36 c
I89
C
c
LALR/LALR(1) Parsing table
ACTION GOTO
I47 d
c d $ S C
36
Note: Find states having similar core and replace them by 47
their union and make transitions correspondingly. 5
89
LALR Parser
shift reduce
 Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d
⇒
LALR Parser
shift reduce
 Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d
⇒
LALR Parser
 The LALR action and goto functions for the condensed sets of items
are shown in Fig. 4.43.
LALR Parser
Exercise 4.7.1:
 Construct the LALR sets of items for the grammar

LALR Parser
First Follow (1) S -> SS+
(2) S -> SS*
S a a+*$ (3) S -> a
 Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
S
Accept
I0 I1 $ I37 I58 Fig: LALR Parsing table
S S + ACTION GOTO
I69 a + * $ S
0 s24 1
*
1 s24 Acc 37
S 24 r3 r3 r3 r3
37 s24 s58 s69 37

a
a I24 a 58 r1 r1 r1 r1
Fig: LALR/LALR(1) Automata 69 r2 r2 r2 r2

LALR Parser
First Follow (1) S -> SS+
(2) S -> SS*
S a a+*$ (3) S -> a
 Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
Fig: CLR Parsing table
ACTION GOTO
S
a + * $ S
0 s2 1 Fig: LALR Parsing table
1 s4 Acc 3 ACTION GOTO

2 r3 r3 a + * $ S
3 s4 s5 s6 7 0 s24 1
4 r3 r3 r3 1 s24 Acc 37
5 r1 r1 24 r3 r3 r3 r3
6 r2 r2 37 s24 s58 s69 37
7 s4 s8 S9 7 58 r1 r1 r1 r1
8 r1 r1 r1 69 r2 r2 r2 r2
9 r2 r2 r2
LALR Parser
 Specify whether the following grammar is in LALR or not
S -> A a | b A c | d c
A -> d
Exercise 4.7.2:
 Construct the
a) canonical LR, and
b) LALR
sets of items for the grammars

1) S -> A a 4) S -> b B a
LALR Parser 2) S -> b A c
3) S -> B c
5) A -> d
6) B -> d
Exercise 4.7.5 Show that the following grammar is in LR(1) but not LALR(1)
I0 I1 Fig: CLR/LR Parsing table
S $
Accept
a b c d $ S A B
I2 I6
A a 0 s3 s5 1 2 4
I3 I7 I11 1 Acc
b A c
2 s6
I8
B a 3 s9 7 8
I9 I12 4 s10
d
5 r5 r6
I4 I10
B c 6 r1
I5 7 s11
d
8 s12
Fig: LALR Parsing table
a b c d $ S A B 9 r6 r5
Fig: LR(1) Automaton
10 r3
0 s3 s59 1 2 4
11 r2
… Acc
12 r4
r-r conflicts, not LALR(1) 59 r5/r6 r5/r6
…
LR Parsers
SLR(1) or LALR(1) or CLR(1) or CLR
LR(0) SLR LALR or LR(1) or LR
(Simple LR) (LookAhead LR) (Canonical LR)
LR(1) automata
LR(0) automata States containing LR(1) items.
DFA
States containing LR(0) items LR(1) item = LR(0) item +
lookahead
Reduce Move Place the

Place the
reduce move in Place the Place the
Placement in reduce move
the entire row reduce move reduce move
Action part of only in the
for the state only in the only in the
the Parsing Follow of
that contains a lookaheads lookaheads
table (LHS).
final item
More powerful More powerful
Power of Parser Least Powerful Most Powerful
than LR(0) than SLR
Number of n : Number of States in a parser
States n(LR(0)) == n(SLR) == n(LALR) <= n(CLR)
Table: Summary of all the table-driven bottom-up parsers
LR Parsers
 Specify whether the following grammar is in SLR, LALR, LR or not
1) S -> A [A] 2) S -> Ac | bA | bc
A -> (A) | ε A -> ε
3) S -> Aa | bAc | Bc | bBa

4) S -> AS | a
A -> d
A -> SA | b
B -> d
LR Parsers
 Specify whether the following grammar is in SLR, LALR, LR or not
1) S -> A [A] 2) S -> Ac | bA | bc
A -> (A) | ε A -> ε

Answer: SLR X, LALR Y, LR Y
Answer: SLR Y, LALR Y, LR Y
3) S -> Aa | bAc | Bc | bBa

4) S -> AS | a
A -> d
A -> SA | b
B -> d
Answer: SLR X, LALR X, LR X
Answer: SLR X, LALR X, LR Y
Note: X - NO, Y - YES

LR Parsers
 Specify whether the following grammar is in LALR, LR or not
5) S -> A a | b A c | d c
A -> d
Answer: LALR Y, LL X
Note: CLR will not make any reduction if there is an error in

the string. LALR and SLR may make a few reductions before
declaring error.
LR Parsers
Exercise 01:
 Consider the following grammar:
S→AA
A→aA|b
Note that this grammar is in LR(0).
Make its CLR, LALR parsing tables and parse the string “abb”.
LR Parsers
Exercise 02:
S→dA|aB
A→bA|c
B→bB|c
Note that this grammar is in LR(0).
Make its CLR, LALR parsing tables and parse the string “abc”.
LR Parsers
Exercise 03:
E→E+T|T
T→T*F|F
F → id
Note that this grammar is in SLR.
Find out if the given grammar is in LR(0), LALR and CLR.

LR Parsers
Exercise 04:
S→(X|E]|F)
X→E)|F]
E→A
F→A
A→λ (Null production)
Is the given grammar in LL, LR(0), SLR, LALR and LR?

LR Parsers
Exercise 05:
 Provide a production with the shortest RHS that introduces s/r

conflict in SLR parser for the given grammar:
S→Ab
A→cbAd
A→cAd
A→λ
LR Parsers
Exercise 06:
 Is the given grammar in LL, LR(0), SLR, LALR and LR?
S→Abbx|Bbby
A→x
B→x
Specify the value of i and j such that the grammar in LL(i) and LR(j).
LR Parsers
Why is LR(1) so powerful?
 Intuitively, for two reasons:
1) Lookahead makes handle-finding easier.

▪ The LR(0) automaton says whether there could be a handle later on
based on no right context.
▪ The LR(1) automaton can predict whether it needs to reduce based on

more information.
2) More states encode more information.

▪ LR(1) lookaheads are very good because there's a greater number of
states to be in.
▪ Goal: Incorporate lookahead without increasing the number of states.

LR Parsers
LR Parsers
 Note that this diagram refers to grammars, not languages, e.g. there may be an
equivalent LR(1) grammar that accepts the same language as another non-LR(1)
grammar. No ambiguous grammar is LL(1) or LR(1), so we must either rewrite the
grammar to remove the ambiguity or resolve conflicts in the parser table or
implementation.
 The hierarchy of LR variants is clear: every LR(0) grammar is SLR(1) and every SLR(1) is
LALR(1) which in turn is LR(1). But there are grammars that don’t meet the
requirements for the weaker forms that can be parsed by the more powerful
variations.
LR Parsers
LL (1) v/s LALR (1)
 Error repair:
▪ Both LL(1) and LALR(1) parsers possess the valid/viable prefix property.
What is on the stack will always be a valid prefix of a sentential form.
Errors in both types of parsers can be detected at the earliest possible
point without pushing the next input symbol onto the stack.
▪ LL(1) parse stacks contain symbols that are predicted but not yet
matched. This information can be valuable in determining proper
repairs.
▪ LALR(1) parse stacks contain information about what has already been
seen, but do not have the same information about the right context that
is expected.
▪ This means deciding possible continuations is somewhat easier in an
LL(1) parser.
LR Parsers
LL (1) v/s LALR (1)
 Efficiency:
▪ Both require a stack of some sort to manage the input. That stack can
grow to a maximum depth of n, where n is the number of symbols in the
input.
▪ If you are using the runtime stack (i.e. function calls) rather than pushing
and popping on a data stack, you will probably pay some significant
overhead for that convenience (i.e. a recursive descent parser takes
that hit).
▪ If both parsers are using the same sort of stack, LL(1) and LALR(1) each
examine every non-terminal and terminal when building the parse tree,
and so parsing speeds tend to be comparable between the two.
LR Parsers - Exercises
 Is the given grammar in LL(1), SLR, LALR(1) , LR or not??
P → M *| ε
M → Q StarM| ε
StarM → (* M *)| ( Q * )
Q → o | ε
Some sentences generated by this grammar: {ε, *, (* *) *, ( * ) *,

( o * ) *, o ( * ) *, o (* *) *, o ( o * ) *, (* (* *) *) *, (* (
* ) *) *, (* o (* *) *) *, o (* ( * ) *) *, (* o ( * ) *) *, o (*
(* *) *) *, (* ( o * ) *) *, o (* ( o * ) *) *, (* o ( o * ) *) *,
o (* o ( * ) *) *, o (* o (* *) *) *, o (* o ( o * ) *) *}
 Is the given grammar in LL(1), LALR(1)??
S → [ X| E )| F [
X → E )| F ]
E → A
F → A
A → ε
 Is the given grammar in LL(1), SLR(1), LALR(1) , CLR(1) or
not??
L → V ( args )
| L equals Var ( )
V → Var + V
| id
Var → id
Some sentences generated by this grammar: {id ( args ), id + id (

args ), id + id + id ( args ), id( args ) equals id ( ), id + id +
id + id ( args ), id + id ( args ) equals id ( ), id + id + id +
id + id( args ), id + id + id ( args ) equals id ( ), id + id + id
+ id + id + id ( args )}
LR Parsers
LR Parsing
LR Parsers
The space of grammars
❑ LALR(1) is a subset of LR(1) and a superset of SLR(1).
❑ A grammar that is not LR(1) is definitely not LALR(1), since whatever

conflict occurred in the original LR(1) parser will still be present in the
LALR(1).
❑ A grammar that is LR(1) may or may not be LALR(1) depending on

whether merging introduces conflicts.
❑ A grammar that is SLR(1) is definitely LALR(1). A grammar that is not

SLR(1) may or may not be LALR(1) depending on whether the more
precise lookaheads resolve the SLR(1) conflicts.
❑ LALR(1) has proven to be the most used variant of the LR family.

❑ The weakness of the SLR(1) and LR(0) parsers mean they are only
capable of handling a small set of grammars.
❑ The expansive memory needs of LR(1) caused it to languish for

several years as a theoretically interesting but intractable approach.
❑ It was the advent of LALR(1) that offered a good balance between

the power of the specific lookaheads and table size.
❑ The popular tools yacc and bison generate LALR(1) parsers and most
programming language constructs can be described with an LALR(1)
grammar.
Parser Generators
LAB Instructions
Parser Generators
The Parser Generator - Yacc
 GNU Bison, commonly known as Bison, is a parser generator that is

part of the GNU Project. Bison reads a specification in the BNF
notation (a context-free language), warns about any parsing
ambiguities, and generates a parser that reads sequences
of tokens and decides whether the sequence conforms to the
syntax specified by the grammar.
 The generated parsers are portable: they do not require any

specific compilers. Bison by default generates LALR(1) parsers but it
can also generate canonical LR, IELR(1) and GLR parsers.
Parser Generators
❑ Interaction between Lex and Yacc
$ lex lexer.l // generates lex.yy.c - contains definition of yylex()

$ yacc parser.y // generates (1) y.tab.c - contains definition of yyparse() (2) y.tab.h
- contains token definitions
 Parser drives the lexical analysis - it must know the function which performs lexical analysis, Hence,
we must declare yylex() function in definitions part of the yacc file
 Similarly, since we expect the lex file to generate tokens, it must know their definitions, Hence, we
must include the y.tab.h file in definitions part of the lex file
Parser Generators
❑ A Yacc source program has three parts:

▪ The Declarations Part
▪ The Translation Rules Part
▪ The Supporting C-Routines Part
%{
/* C includes */
}%
/* Other Declarations */
%%
/* Rules */
%%
/* user subroutines */
Parser Generators
➢ Names(terminals) representing tokens must be declared; this is done by
writing
• %token name1 name2 . . .
➢ Every name not defined in the declarations section is assumed to represent

a nonterminal symbol.
➢ Every nonterminal symbol must appear on the left side of at least one rule.
➢ Any name not declared as a token in the declarations section is assumed

to be a nonterminal.
➢ By convention, tokens have uppercase names, although bison doesn’t

require it.
➢ If a symbol neither is a token nor appears on the left side of a rule, it’s like
an unreferenced variable in a C program. It doesn’t hurt anything, but it
probably means the programmer made a mistake.
Parser Generators
 Start symbol : – may be declared, via:

▪ %start name
– if not declared explicitly, defaults to the nonterminal on the LHS of the first
grammar rule listed.
Parser Generators
#define YYSTYPE double
 The type of yylval is determined by YYSTYPE. Since the default type is

integer. Token values 0-255 are reserved for character values.
 Generated token values typically start around 258 because lex

reserves several values for end-of-file and error processing.
Parser Generators
 The precedences and associativities are attached to tokens in the

declarations section.
 This is done by a series of lines beginning with the yacc keywords %left,
%right, or %nonassoc, followed by a list of tokens. All of the tokens on the
same line are assumed to have the same precedence level and
associativity; the lines are listed in order of increasing precedence or
binding strength. Thus:
• %left '+' '-'
• %left '*' '/'
 describes the precedence and associativity of the four arithmetic

operators. + and - are left associative and have lower precedence
than * and /, which are also left associative. The keyword %right is used to
describe right associative operators.
Parser Generators
 Example 4.69 : To illustrate how to prepare a Yacc source program, let us

construct a simple desk calculator that reads an arithmetic expression,
evaluates it, and then prints its numeric value. We shall build the desk
calculator starting with the with the following grammar for arithmetic
expressions:
 The token digit is a single digit between 0 and 9. A Yacc desk calculator
program derived from this grammar is shown in Fig. 4.58.
Parser Generators
main() repeatedly calls yyparse() until

the lexer’s input file runs out.
Parser Generators
❑ The Declarations Part
 There are two sections in the declarations part of a Yacc program;

both are optional.
 In the first section, we put ordinary C declarations, delimited by %{ and

%}. Here we place declarations of any temporaries used by the
translation rules or procedures of the second and third sections.
 In Fig. 4.58, this section contains only the include-statement
#include <ctype.h>
that causes the C preprocessor to include the standard header file

<ctype.h> that contains the predicate isdigit.
Parser Generators
 The Declarations Part
 Also in the declarations part are declarations of grammar tokens. In Fig.

4.58, the statement
%token DIGIT
declares DIGIT to be a token. Tokens declared in this section can then be

used in the second and third parts of the Yacc specification.
If Lex is used to create the lexical analyzer that passes token to the Yacc
parser, then these token declarations are also made available to the
analyzer generated by Lex.
Parser Generators
❑ The Translation Rules Part
 In the part of the Yacc specification after the first %% pair, we put the
translation rules.
 Each rule consists of a grammar production and the associated semantic

action. A set of productions that we have been writing:
Parser Generators
 In a Yacc production, unquoted strings of letters and digits not declared to

be tokens are taken to be nonterminals.
 A quoted single character, e.g. 'c‘, is taken to be the terminal symbol c, as

well as the integer code for the token represented by that character (i.e.,
Lex would return the character code for ‘c’ to the parser, as an integer).
 Alternative bodies can be separated by a vertical bar, and a semicolon

follows each head with its alternatives and their semantic actions. The first
head is taken to be the start symbol.
Parser Generators

Parser Generators

Parser Generators
The rules section simply consists of a list of grammar rules. Since ASCII
keyboards don’t have a → key, we use a colon between the left- and
right-hand sides of a rule, and we put a semicolon at the end of each rule.
Parser Generators
Parser Generators
Execution
❑ Linux/MacOS
$ lex lexer.l // generates lex.yy.c
$ yacc parser.y // generates y.tab.c, y.tab.h
$ gcc y.tab.c lex.yy.c -ll -ly // linking lex and yacc
$ ./a.out < input // run the executable
❑ Windows
$ bison -dy prog.y
$ flex hello.l
$ gcc y.tab.c lex.yy.c
$ a.exe < input
References
 Compilers–Principles, Techniques and Tools, Alfred V.

Aho, Monica S. Lam, Ravi Sethi, Jeffery D. Ullman, 2nd
Edition
LR Parsing: Simple LR
Use of the LR(0) Automaton
 The central idea behind "Simple LR," or SLR, parsing is the construction of
LR(0) automaton from the grammar.
 How can LR(0) automata help with shift-reduce decisions?
 Shift-reduce decisions can be made as follows.
 Suppose that the string γ of grammar symbols takes the LR(0) automaton
from the start state 0 to some state j.
▪ Then, shift on next input symbol a if state j has a transition on a.
▪ Otherwise, we choose to reduce; the items in state j will tell us which

production to use.
Use of the LR(0) Automaton
Example : Figure 4.34 illustrates the actions of a shift-reduce parser on input

id *id, using the LR(0) automaton in Fig. 4.31. We use a stack to hold
states.
Note: Item sets with no outgoing arrows mean reduce

The LR-Parsing
The LR-Parsing
Means shift and move to state 4
Means reduce using production number 1

The LR-Parsing
The LR-Parsing
 Example 4.48 : Every SLR(1) grammar is unambiguous, but there are many
unambiguous grammars that are not SLR(1). Consider the grammar with
productions
Think of L and R as standing for l-value and r-value, respectively, and * as

an operator indicating "contents of."
➢ A reason for emphasizing error recovery during parsing is that
▪ Many errors appear syntactic, whatever their cause, and are exposed
when parsing cannot continue.
▪ A few semantic errors, such as type mismatches, can also be detected
efficiently.
An accurate detection of semantic and logical errors at compile time is
in general a difficult task.

COP CD Unit2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COP CD Unit2 PDF

Uploaded by

Copyright:

Available Formats

UE20CS353: Compiler Design

Chapter 4: Syntax Analysis

1. The role of the Parser

➢ By design, every programming language has precise rules that

➢ In C, for example, a program is made up of functions, a function

➢ The syntax of programming language constructs can be specified

➢ Grammars offer significant benefits for both language designers

1. A grammar gives a precise, yet easy-to-understand, syntactic

2. From certain classes of grammars, we can construct automatically

As a side benefit, the parser-construction process can reveal

➢ Grammars offer significant benefits for both language designers

3. The structure imparted to a language by a properly designed

4. A grammar allows a language to be evolved or developed

➢ The Parser should

1. Report any syntax errors in an intelligible fashion.

2. Recover from commonly occurring errors to continue

➢ The parser constructs a parse tree for well-formed programs and

➢ There are three general types of parsers for grammars:

➢ Universal parsing methods such as the Cocke-Younger-

➢ The LL and LR grammars describe most of the syntactic constructs in

➢ An LL grammar is a context-free grammar that can be parsed by an LL parser.

➢ LL parser, also known as top-down parser.

➢ LL parsers are often called “predictive parsers”

➢ An LR grammar is a context-free grammar that can be parsed by an LR parser.

➢ LR parser, also known as bottom-up parser.

➢ LR parsers are often called “shift-reduce” parsers.

LL uses grammar rules in an order which LR does a post-order traversal.

➢ Parsers implemented by hand often use LL grammars;

➢ Parsers for the larger class of LR grammars are usually

➢ In this chapter, we concentrate on expressions, which present

➢ E represents expressions consisting of terms separated by + signs,

➢ F represents factors that can be either parenthesized expressions

E' → + T E'| ϵ T -> T * F | F (4.1)

➢ The following grammar(4.3) treats + and * alike, so it is useful for illustrating

Syntax Error Handling

➢ Most programming language specifications do not describe how a

➢ Lexical errors: Lexical error is a sequence of characters that does not

• Missing quotes around text intended as a string. (Unmatched string)

• Appearance of illegal characters

➢ Syntactic errors include

• extra or missing braces; that is, "{" or " } "

• the appearance of a case statement without an enclosing switch in C or

➢ Semantic errors include

➢ type mismatches between operators and operands.

➢ Logical errors include

➢ the use of the assignment operator = instead of the comparison operator ==

in a C program. The program containing = may be well formed; however, it

may not reflect the programmer's intent.

detected very efficiently.

➢ The LL and LR parsing methods, detect an error as soon as possible;

parsed further according to the grammar for the language.

➢ The viable-prefix property of parsers allows early detection of

➢ The viable-prefix property of parsers allows early detection of

prefix of any string in the language

1. Report the presence of errors clearly and accurately.

2. Recover from each error quickly enough to detect subsequent

3. Add minimal overhead to the processing of correct programs.

occurred within the previous few tokens.

at which an error is detected.

1. Stop immediately and signal an error .

2. Record the error but try to continue.

In the second case, the user might be overwhelmed by a whole series of

error messages, all caused by essentially the same problem.