You are on page 1of 371

UE20CS353: Compiler Design

Chapter 4: Syntax Analysis

1. The role of the Parser


2. Error-Recovery Strategies
3. Introduction to different parsers.
4. Top-Down parsing
5. Bottom-Up parsing

Mr. Prakash C O
Asst. Professor,
Dept. of CSE, PESU
coprakasha@pes.edu
Syntax Analysis

➢ By design, every programming language has precise rules that


prescribe the syntactic structure of well-formed programs.

➢ In C, for example, a program is made up of functions, a function


out of declarations and statements, a statement out of expressions,
and so on.

➢ The syntax of programming language constructs can be specified


by context-free grammars or BNF notation.

➢ Grammars offer significant benefits for both language designers


and compiler writers.
Syntax Analysis
➢ Grammars offer significant benefits for both language designers
and compiler writers.

1. A grammar gives a precise, yet easy-to-understand, syntactic


specification of a programming language.

2. From certain classes of grammars, we can construct automatically


an efficient parser that determines the syntactic structure of a
source program.

As a side benefit, the parser-construction process can reveal


syntactic ambiguities and trouble spots that might have slipped
through the initial design phase of a language.
Syntax Analysis

➢ Grammars offer significant benefits for both language designers


and compiler writers. Cont…

3. The structure imparted to a language by a properly designed


grammar is useful for translating source programs into correct object
code and for detecting errors.

4. A grammar allows a language to be evolved or developed


iteratively, by adding new constructs to perform new tasks.
These new constructs can be integrated more easily into an
implementation that follows the grammatical structure of the
language.
The Role of the Parser

➢ The parser obtains a string of tokens from the lexical analyzer, as shown in
Fig. 4.1, and verifies that the string of token names can be generated by the
grammar for the source language.
The Role of the Parser

➢ The Parser should

1. Report any syntax errors in an intelligible fashion.

2. Recover from commonly occurring errors to continue


processing the remainder of the program.

➢ The parser constructs a parse tree for well-formed programs and


passes it to the rest of the compiler for further processing.
The Role of the Parser

➢ There are three general types of parsers for grammars:

1. Universal,

2. Top-down, and

3. Bottom-up.

➢ Universal parsing methods such as the Cocke-Younger-


Kasami(CYK) algorithm and Earley's algorithm can parse any
grammar.
These general methods are, however, too inefficient to use in
production compilers.
The Role of the Parser
➢ The parsing methods commonly used in compilers can be
classified as being either top-down or bottom-up.
➢ Top-down methods build parse trees from the top (root) to the
bottom (leaves),

➢ Bottom-up methods build parse trees from the leaves and work their
way up to the root.

➢ In either case, the input to the parser is scanned from left to right, one
symbol at a time.
Fig: Parse Tree
Grammar:
The Role of the Parser
LL grammar and LR grammar

➢ The LL and LR grammars describe most of the syntactic constructs in


modern programming languages.

LL grammar: LR grammar:
E → T E' E → E + T | T
E' → + T E'| ϵ T → T * F | F
T → F T' F → (E) | id
T' → * F T'| ϵ
The Role of the Parser
LL grammar and LL parser

➢ An LL grammar is a context-free grammar that can be parsed by an LL parser.

➢ The LL parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The second L in LL means that the parser produces a Leftmost derivation: it
does a top-down parse.

➢ Top-down parser build parse trees from the top (root) to the bottom (leaves).

➢ LL parser, also known as top-down parser.

➢ LL parsers are often called “predictive parsers”


The Role of the Parser
LR grammar and LR parser

➢ An LR grammar is a context-free grammar that can be parsed by an LR parser.

➢ The LR parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The R means that the parser produces a Rightmost derivation in reverse: it does
a bottom-up parse.

➢ Bottom-up parser build parse trees from the leaves and work their way up to
the root.

➢ LR parser, also known as bottom-up parser.

➢ LR parsers are often called “shift-reduce” parsers.


The Role of the Parser
LL parser LR parser
LL parser begin at the start symbol and try to apply LR parser begin at the target string and try to
productions to arrive at the target string arrive back at the start symbol.

LL parsing, also known as top-down parsing LR parsing, also known as bottom-up parsing
LL starts with only the root non terminal on the LR ends with only the root non terminal on the
stack stack.

LL uses grammar rules in an order which LR does a post-order traversal.


corresponds to pre-order traversal of the parse
tree
LL continuously pops a non terminal off the stack, LR tries to recognize a right hand side on the
and pushes a corresponding right hand side stack, pops it, and pushes the corresponding
non terminal.

LL reads terminal when it pops one off the stack LR reads terminals while it pushes them on the
stack.

LL parsers are often called “predictive parsers” LR parsers are often called “shift-reduce parsers.
The Role of the Parser

➢ Parsers implemented by hand often use LL grammars;


for example, the predictive-parsing approach works for LL
grammars.

➢ Parsers for the larger class of LR grammars are usually


constructed using automated tools.
Representative Grammars
➢ Constructs that begin with keywords like if, for, while or int , are
relatively easy to parse, because the keyword guides the choice
of the grammar production that must be applied to match the
input.

➢ In this chapter, we concentrate on expressions, which present


more of challenge, because of the associativity and precedence
of operators.
Representative Grammars
➢ Associativity and precedence are captured in the following grammar.

➢ E represents expressions consisting of terms separated by + signs,


➢ T represents terms consisting of factors separated by * signs, and

➢ F represents factors that can be either parenthesized expressions


or identifiers:

E -> E + T | T

T -> T * F | F (4.1)

F -> (E) | id
The above expression grammar belongs to the class of LR grammars
that are suitable for bottom-up parsing.
This grammar can be adapted to handle additional operators and
additional levels of precedence.
Representative Grammars
➢ The following non-left-recursive variant of the expression grammar (4.1) will
be used for top-down parsing:

E → T E' E -> E + T | T

E' → + T E'| ϵ T -> T * F | F (4.1)


(4.2) F -> (E) | id
T → F T’

T' → * F T'| ϵ

➢ The following grammar(4.3) treats + and * alike, so it is useful for illustrating


techniques for handling ambiguities during parsing:

E → E + E | E * E | ( E ) | id (4.3)

Here, E represents expressions of all types. Grammar (4.3) permits more than
one parse tree for expressions like a + b*c.
Syntax Analysis

Syntax Error Handling


Syntax Error Handling
➢ A good compiler should assist a programmer in identifying and
locating errors.

Few languages have been designed with error handling in mind, even
though errors are so commonplace.

➢ Most programming language specifications do not describe how a


compiler should respond to errors; error handling is left to the compiler
designer.

➢ Planning the error handling right from the start can both simplify the
structure of a compiler and improve its handling of errors.
Syntax Error Handling
Common programming errors can occur at many different levels.

➢ Lexical errors: Lexical error is a sequence of characters that does not


match the pattern of any token. Lexical errors include

• Missing quotes around text intended as a string. (Unmatched string)

• Appearance of illegal characters

➢ Syntactic errors include


• misplaced semicolons or

• extra or missing braces; that is, "{" or " } "

• the appearance of a case statement without an enclosing switch in C or


Java,.
Syntax Error Handling
Common programming errors can occur at many different levels. Cont…

➢ Semantic errors include

➢ type mismatches between operators and operands.

➢ Example: the return of a value in a Java method with result type void.

➢ Logical errors include

➢ the use of the assignment operator = instead of the comparison operator ==

in a C program. The program containing = may be well formed; however, it

may not reflect the programmer's intent.


Syntax Error Handling
➢ The precision of parsing methods allows syntactic errors to be

detected very efficiently.

➢ The LL and LR parsing methods, detect an error as soon as possible;


that is, when the stream of tokens from the lexical analyzer cannot be

parsed further according to the grammar for the language.

➢ The viable-prefix property of parsers allows early detection of

syntax errors.
Syntax Error Handling
Viable-prefix Property

➢ The viable-prefix property of parsers allows early detection of


syntax errors
➢ Goal: detection of an error as soon as possible without further consuming

unnecessary input

➢ How: detect an error as soon as the prefix of the input does not match a

prefix of any string in the language


Syntax Error Handling
➢ The goals of error handler in a parser are simple to state but

challenging to realize:

1. Report the presence of errors clearly and accurately.

2. Recover from each error quickly enough to detect subsequent

errors.

3. Add minimal overhead to the processing of correct programs.


Syntax Error Handling
➢ How should an error handler report the presence of an error?

➢ At the very least, it must report the place in the source program where an

error is detected, because there is a good chance that the actual error

occurred within the previous few tokens.

➢ A common strategy is to print the offending line with a pointer to the position

at which an error is detected.


Syntax Analysis

Error-Recovery Strategies
Error-Recovery Strategies
➢ Once an error is detected, how should the parser recover/react?

1. Stop immediately and signal an error .

2. Record the error but try to continue.

In the first case, the user must recompile from scratch after possibly a

trivial fix.

In the second case, the user might be overwhelmed by a whole series of

error messages, all caused by essentially the same problem.

➢ We will talk about how to do error recovery in a principled way.


Error-Recovery Strategies

➢ Error recovery:

➢ The process of adjusting input stream so that the parser can


continue after unexpected input

➢ Possible adjustments:

➢ delete tokens

➢ insert tokens

➢ substitute tokens

➢ Error recovery is possible in both top-down and bottom-up parsers


Error-Recovery Strategies

Error-Recovery Strategies

1. Panic-Mode Recovery

2. Phrase-Level Recovery

3. Error Productions

4. Global Correction
Error-Recovery Strategies
1. Panic-Mode Recovery

➢ In this method, on discovering an error, the parser discards input


symbols one at a time until one of a designated set of synchronizing
tokens is found.

➢ The synchronizing tokens are usually delimiters, such as ; or }, whose


role in the source program is clear and unambiguous.

➢ Disadvantage is that a considerable amount of input is skipped


without checking it for additional errors.

➢ Advantage is that its easy to implement and guarantees not to go to


infinite loop.
Error-Recovery Strategies
1. Panic-Mode Recovery
➢ In case of an error like:
1) a = b + c // no semi-colon
d = *e + f ;
2) int =10;

The compiler will discard all subsequent tokens till a semi-colon is


encountered.

This is a crude method but often turns out to be the best method.

In situations where multiple errors in the same statements are rare, this
method may be quite adequate.
Error-Recovery Strategies
2. Phrase-Level Recovery (Statement Mode recovery)

➢ On discovering an error, a parser may perform local correction on the


remaining input; that is, it may replace a prefix of the remaining input by
some string that allows the parser to continue.

➢ A typical local correction is to


• replace a comma by a semicolon, E.g.: scanf(“%d”, &x), replace , with ;
• delete an extraneous semicolon or brace, E.g.: {… {…} …}}
• insert a missing semicolon, E.g.: scanf(“%d”, &x)

➢ The choice of the local correction is left to the compiler designer.


We must be careful to choose replacements that do not lead to
infinite loops.
Error-Recovery Strategies
3. Error Productions

➢ If user/designer has knowledge of common errors that can be


encountered, then these errors can be incorporated by augmenting the
grammar with error productions that generate erroneous constructs.

➢ This augmented grammar (CFG+ error productions ) detects the


anticipated errors when an error production is used during parsing and
parsing can be continued.

➢ The parser can then generate appropriate error diagnostics about the
erroneous construct that has been recognized in the input.
Error-Recovery Strategies
3. Error Productions Cont…

➢ If we have an idea of common errors that might occur, we can include the
errors in the grammar at hand.

For example if we have a production rule like:

E → +E|-E Then, a=+b; a=-b; a=*b; a=/b;

Here, the last two are error situations. Now, we change the grammar as:

E → +E | -E | *A | /A A → E

Hence, once it encounters *A, it sends an error message asking the user if
he is sure he wants to use a unary “*”.

If this is used then, during parsing appropriate error messages can be generated and
parsing can be continued.
Error-Recovery Strategies
4. Global Correction

➢ In this approach, the compiler should make as few changes as possible in


processing an incorrect input string.
There are algorithms for choosing a minimal sequence of changes to
obtain a globally least-cost correction.

➢ Given an incorrect input string x and grammar G, these algorithms will find
a parse tree for a closest error-free string y, such that the number of
insertions, deletions, and changes of tokens required to transform x into y is
as small as possible.

➢ These methods are in general too costly to implement in terms of time and
space, so these techniques are currently only of theoretical interest.
Syntax Analysis

Context-Free Grammars
Context-Free Grammars
➢ Grammars are used to specify the syntax of a language.

➢ A grammar naturally describes the hierarchical structure of most


programming language constructs.

➢ For example, an if-else statement in Java can have the form


if ( expression ) statement else statement

Using the variable expr to denote an expression and the variable


stmt to denote a statement, this structuring rule can be
expressed as

stmt —> if ( expr ) stmt else stmt


Context-Free Grammars
Components of Context-Free Grammar
➢ Set of terminal symbols

➢ Set of nonterminals

➢ Set of productions

➢ The head is nonterminal

➢ The body is a sequence of teminals and/or nonterminals

➢ Designation of one nonterminal as starting symbol


Context-Free Grammars
➢ Production rules.
Context-Free Grammars

➢ Example 1:
Context-Free Grammars

➢ Example 2:

What does this grammar generate?


Context-Free Grammars
Some Definitions
➢ String of terminals: sequence of zero or more terminals

➢ Derivation:
➢ given the grammar (i.e. productions)
➢ begin with the start symbol
➢ repeatedly replacing nonterminal by the body
➢ We obtain the language defined by the grammar (i.e. group of
terminal strings)
➢ Example:

How to derive: 9-5+7 from the above rules?


Context-Free Grammars
Some Definitions

➢ Parsing:
 Given a string of terminals

 Figure out how to derive it from the start symbol of the grammar

 If it cannot be derived from the start symbol of the grammar, then


reporting syntax errors within the string.

➢ Parsing is the process of determining how a string of terminals can


be generated by a grammar.
Context-Free Grammars
Parse Tree

 A parse tree pictorially shows how the start symbol of a grammar


derives a string in the language.

 A parse tree according to the grammar is a tree with the following


properties:

1. The root is labeled by the start symbol.

2. Each leaf is labeled by a terminal or by ϵ.

3. Each interior node is labeled by a nonterminal.


Context-Free Grammars

Parse Tree

➢ Example: Figure 2.5: Parse tree for 9-5+2


Context-Free Grammars

Parse Tree

➢ Example: Figure 2.5: Parse tree for 9-5+2


Context-Free Grammars
➢ Ambiguity
➢ A grammar that produces more than one parse tree for some
sentence is said to be ambiguous.

➢ E → E + E | E * E | ( E ) | id (4.3)

➢ The arithmetic expression grammar (4.3) permits two distinct


leftmost derivations for the sentence id + id * id:
Context-Free Grammars
Associativity of Operators

➢ How will you evaluate this?

9-5-2
▪ Will ‘5’ go with the ‘-’ on the left or the one on the right?

▪ If it goes with the one on the left: (9-5)-2 we say that the operator ‘-’ is
left-associative

▪ If it goes with the one on the right: 9-(5-2) we say that the operator ‘-’ is
right-associative
Context-Free Grammars
Associativity of Operators

➢ How to express associativity in production rules?


Context-Free Grammars
Precedence of Operators
 Associativity applies to occurrence of the same operator

 What if operators are different?

 How will you evaluate: 9-5*2

 We say ‘*’ has higher precedence than ‘-’ if it takes its operands before
‘-’

 How to present this in productions?


Context-Free Grammars
Leftmost and Rightmost Derivations

➢ In leftmost derivations, the leftmost nonterminal in each sentential is always


chosen. If is a step in which the leftmost nonterminal in is replaced, we
write

➢ In rightmost derivations, the rightmost nonterminal is always chosen; we write


in this case.
Context-Free Grammars
Context-Free Grammar Vs Regular Expressions

➢ Grammars are more powerful notations than regular expressions

 Every construct that can be described by a regular expression can be


described by a grammar, but not vice-versa

➢ NFA to CFG
Context-Free Grammars
(a|b)*abb
Context-Free Grammars
Question Worth Asking

➢ If grammars are much powerful than regular expressions, why not


using them in lexical analysis too?

• Lexical rules are quite simple and do not need notation as


powerful as grammars

• Regular expressions are more concise and easier to understand


for tokens

• More efficient lexical analyzers can be generated from regular


expressions than from grammars
Context-Free Grammars
➢ How Can We Enhance Our Grammar?
1. Eliminating ambiguity
▪ Re-write grammar to eliminate ambiguity.

2. Eliminating left-recursion
▪ Top-down parser cannot handle left-recursive grammars. During
parsing, it is possible for a top down parser to loop forever, So a
transformation is needed to eliminate left recursion.

▪ A grammar is said to be "left-recursive“, if the leftmost symbol of the


body is same as the nonterminal at the head of the production.
(e.g.: E → E + T)

3. Left factoring
▪ Elimination of common prefixes.
Context-Free Grammars
Eliminating Ambiguity

 Sometimes we can re-write grammar to eliminate ambiguity


Ambiguous Grammar: Input string:

Note: When there is multiple IF and a single ELSE then the ELSE part doesn't get a clear view to go with which IF, this
problem is called dangling else problem.

Two different parse trees for the input string:


Context-Free Grammars
Eliminating Ambiguity
Input string: (4.15)

Ambiguous Grammar:

(4.14)

Unambiguous Grammar:

This grammar associates


each else with the closest
previous unmatched then.

Fig. 4.10
Context-Free Grammars
Eliminating Ambiguity

(4.15)

Fig. 4.10
(4.14)

The idea is that a statement appearing between a then and an else must be "matched" ;
that is, the interior statement must not end with an unmatched or open then.
A matched statement is either an if-then-else statement containing no open statements
or it is any other kind of unconditional statement.
Thus, we may use the grammar in Fig. 4.10. This grammar generates the same strings as
the dangling-else grammar (4.14), but it allows only one parsing for string (4.15); namely,
the one that associates each else with the closest previous unmatched then.
Context-Free Grammars
Eliminating Ambiguity
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.
Context-Free Grammars
Eliminating Ambiguity
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.

This ambiguity disappears if we use the productions

▪ A → AB | B
B → α

or

▪ A → BA | B
B → α
Context-Free Grammars
Eliminating Left-Recursion

➢ A grammar is left recursive if it has a nonterminal A such that there


is a derivation for some string

➢ Top-down parsing methods cannot handle left-recursive


grammars, so a transformation is needed to eliminate left
recursion.

➢ Example:
Context-Free Grammars
Eliminating Left-Recursion
Context-Free Grammars
Eliminating Left-Recursion
Context-Free Grammars
Eliminating Left-Recursion

➢ Example 1:

➢ Example 2: (Indirect left recursion elimination)

A → Bxy | x A → Bxy | x A → Bxy | x


B → CD B → CD B → CD
C→A|c C → Bxy | x | c C → CDxy | x | c
D→d D→d D→d
Context-Free Grammars
Eliminating Left-Recursion

➢ Example 1: E → T E'
E' → + T E'| ϵ
T → F T’
T' → * F T’| ϵ
F → (E)| id

➢ Example 2: (Indirect left recursion elimination)

A → Bxy | x A → Bxy | x A → Bxy | x A → Bxy | x

B → CD B → CD B → CD B → CD

C→A|c C → Bxy | x | c C → CDxy | x | c C → xC' | cC'

D→d D→d D→d C' → DxyC' | ϵ


D→d
Context-Free Grammars
Left Factoring (Elimination of common prefixes)

 If a grammar contains two productions of form S→ aα and S → aβ


it is not suitable for top-down parsing without backtracking.

 Troubles of this form can sometimes be removed from the grammar


by a technique called the left factoring.
Context-Free Grammars
Left Factoring (Elimination of common prefixes)

 Left factoring is a grammar transformation that is useful for producing a grammar


suitable for predictive, or top-down, parsing.

 When the choice between two alternative A-productions is not clear, we may be
able to rewrite the productions to defer the decision until enough of the input has
been seen that we can make the right choice.

In general, A → αβ1| αβ2 are two A-productions, and the input begins with a
nonempty string derived from α.
We do not know whether to expand A to αβ1 or αβ2. However, we may defer the
decision by expanding A to αA'. Then, after seeing the input derived from α, we
expand A' to β1 or β2
Context-Free Grammars
Left Factoring (Elimination of common prefixes)

 Left factoring is a process by which the grammar with common prefixes is


transformed to make it useful for Top down parsers.

 How?

 In left factoring,
▪ We make one production for each common prefix.

▪ The common prefix may be a terminal or a non-terminal or a combination of


both.

▪ Rest of the derivation is added by new productions.

 The grammar obtained after the process of left factoring is called as Left
Factored Grammar.
Context-Free Grammars
Left Factoring (Elimination of common prefixes)

 Important Note:
▪ During left most derivation, when the choice between two alternative
A-productions is not clear, the right choice of A-production selection needs k
symbols lookahead on the input.

▪ Left Factoring (Elimination of common prefixes) avoids k-symbols lookahead on


the input.

▪ Left Factoring converts k-symbols lookahead on the input to 1-symbol lookahead


(that is LL(k) to LL(1)).

▪ Left factored grammar avoids backtracking in Recursive Descent Parsing.


Context-Free Grammars
Left Factoring

 Example:
Context-Free Grammars
Left Factoring

 Example:
Context-Free Grammars
 Do left factoring in the following grammars-

1. A → aAB / aBc / aAc

2. S → bSSaaS | bSSaSb | bSb | a

3. S → aSSbS | aSaSb | abb | b

4. S → a | ab | abc | abcd

5. S → aAd | aB A → a | ab B → ccd | ddc


Context-Free Grammars
 Do left factoring in the following grammars-

1. A → aAB | aBc | aAc


Context-Free Grammars
 Do left factoring in the following grammars-

1. A → aAB | aBc | aAc

Step-01:

A → aA'
A' → AB | Bc | Ac

Again, this is a grammar with common prefixes.

Step-02:

A → aA'
A' → AD | Bc
D → B | c

This is a left factored grammar.


Context-Free Grammars
 Do left factoring in the following grammars-

2. S → bSSaaS | bSSaSb | bSb | a


Context-Free Grammars
 Do left factoring in the following grammars-

2. S → bSSaaS | bSSaSb | bSb | a

Step-01:

S → bSS' / a
S' → SaaS / SaSb / b

Again, this is a grammar with common prefixes.

Step-02:

S → bSS' / a
S' → SaA / b
A → aS / Sb

This is a left factored grammar.


Context-Free Grammars
 Do left factoring in the following grammars-

3. S → aSSbS / aSaSb / abb / b


Context-Free Grammars
 Do left factoring in the following grammars-

3. S → aSSbS / aSaSb / abb / b

Step-01:

S → aS' / b
S' → SSbS / SaSb / bb

Again, this is a grammar with common prefixes.

Step-02:

S → aS' / b
S' → SA / bb
A → SbS / aSb

This is a left factored grammar.


Context-Free Grammars
 Do left factoring in the following grammars-

4. S → a / ab / abc / abcd
Context-Free Grammars
 Do left factoring in the following grammars-

4. S → a / ab / abc / abcd
Step-01:
S → aS'
S' → b / bc / bcd / ∈
Again, this is a grammar with common prefixes.

Step-02: Step-03:

S → aS' S → aS'
S' → bA / ∈ S' → bA / ∈
A → c / cd / ∈ A → cB / ∈
Again, this is a grammar with B → d / ∈
common prefixes. This is a left factored grammar.
Context-Free Grammars
 Do left factoring in the following grammars-

5. S → aAd / aB A → a / ab B → ccd / ddc


Context-Free Grammars
 Do left factoring in the following grammars-

5. S → aAd / aB A → a / ab B → ccd / ddc

The left factored grammar is

S → aS'
S' → Ad / B
A → aA'
A' → b / ∈
B → ccd / ddc
Parsing

Top-Down Parsing
Top-Down Parsing
 Top-down parsing can be viewed as
▪ The problem of constructing a parse tree for the input string, starting from the
root and creating the nodes of the parse tree in preorder (depth-first).

 Top-down parsing can also be viewed as


▪ Finding a leftmost derivation for an input string.

 At each step of a top-down parse:

▪ The key problem is that of determining the production to be applied for a


nonterminal, say A.

▪ Once an A-production is chosen, the rest of the parsing process consists of


"matching” the terminal symbols in the production body with the input
string.
Top-Down Parsing
 Show the sequence of parse trees for the input id+id*id

Grammar:
Top-Down Parsing
 Show the sequence of parse trees for the input id+id*id

E⇒ TE'
⇒ FT'E'
⇒ idT'E'
⇒ idE'
⇒ id+TE'
⇒ id+FT'E'
⇒ id+idT'E'
⇒ id+id*FT'E'
⇒ id+id*idT'E'
⇒ id+id*idE'
⇒ id+id*id
Top-Down Parsing
Grammars should be free from
left recursion and ambiguities.

Table driven

Enhanced Grammar free from


• Ambiguity
• Left recursion
• Common prefixes

Grammars are not left Grammars are left factored to


factored avoid backtracking
Top-Down Parsing
Recursive Descent Parsing

 Recursive descent is a top-down parsing technique that constructs the


parse tree from the top and the input is read from left to right.

 A recursive descent parser consists of several small functions/procedures,


one for each nonterminal in the grammar.

 This parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking. But the grammar associated
with it (if not left factored) cannot avoid back-tracking.

 RDP can be used to parse different types of code such as XML or other
inputs.
Top-Down Parsing
Recursive-Descent Parsing

 Recursive-descent parsing(RDP) is one of the simplest parsing


techniques that is used in practice.
▪ The basic idea is to associate each non-terminal with a procedure.
(A RDP consists of several small functions, one for each nonterminal in the grammar.)

▪ The goal of each such procedure is to

1. read a sequence of input characters that can be generated by the


corresponding non-terminal, and

2. return a pointer to the root of the parse tree for the non-terminal.

▪ The structure of the procedure is dictated by the productions for the


corresponding non-terminal.
Top-Down Parsing
Recursive-Descent Parsing

 The RDP procedure attempts to "match" the right hand side of


some production for a non-terminal.
 To match a terminal symbol, the procedure compares the terminal
symbol to the input; if they agree, then the procedure is successful,
and it consumes the terminal symbol in the input (that is, moves the
input cursor over one symbol).

 To match a non-terminal symbol, the procedure simply calls the


corresponding procedure for that non-terminal symbol (which may be
a recursive call, hence the name of the technique).
Top-Down Parsing
Recursive-Descent Parsing (with backtracking)

 RDP execution begins with the procedure for the start symbol,
which halts and announces success if its procedure body scans
the entire input string.

 RDP with backtracking determines which production to use by


trying each production in turn.

 RDP with backtracking cannot work with left recursive grammars


because it would cause the program to enter an infinite loop.
Consider the grammar rule A -> Aa | b
Procedure A() would roughly have the following pseudocode: void A() { A(); … }
Top-Down Parsing
Recursive-Descent Parsing

 Each non-terminal corresponds to a procedure.

 Example: A → aBb (This is only the production rule for A)

proc A {
Match the current token with a, and move to the next token;
Call ‘B’;
Match the current token with b, and move to the next token;
}
Top-Down Parsing
Recursive-Descent Parsing

 A → aBb| bAB and B → c input: acb

procA { Non backtracking RDP procedure for left factored grammar with one symbol lookahead.

case (current token/current input symbol) {

‘a’: Match the current token with a, and move to the next token;

Call ‘B’;

Match the current token with b, and move to the next token;

‘b’: Match the current token with b, and move to the next token;

Call ‘A’;

Call ‘B’;

}
Top-Down Parsing
Recursive-Descent Parsing

 When to apply ε-productions.


A → aA| bB| ε

 If all other productions fail, we should apply an ε-production.


For example, if the current token is not a or b, we may apply the
ε-production.

 Most correct choice: We should apply an ε-production for a


non-terminal A when the current token is in the follow-set of A
(which terminals can follow A in the sentential forms).
Top-Down Parsing
A → aBe| cBd| C
B → bB| ε
Non backtracking RDP procedure for left factored grammar with one symbol lookahead.
C→f
proc A {
case (current token/current input symbol) {
a: Match the current token with a, and move to the next token;
Call B;
Match the current token with e, and move to the next token;
c: Match the current token with c, and move to the next token;
Call B;
Match the current token with d, and move to the next token;
f: Call C:
}
}

proc C {
Match the current token with f, and move to the next token;
}

proc B
{
case (current token/current input symbol){
b: Match the current token with b, and move to the next token;
Call B
e,d: do nothing
}
Top-Down Parsing
Example of Backtracking

 Based on the information the parser currently has about the input,

▪ A decision is made to go with one particular production.

▪ If this choice leads to a dead end, the parser would have to backtrack
to that decision point, moving backwards through the input, and
start again making a different choice and so on until it either found the
production that was the appropriate one or ran out of choices.

 For example, consider this simple grammar:


S → bab | bA (Grammar is not left factored, backtracking is required)
A → d | cA
Note: Writing RDP procedures for RDP with Backtracking(for non left factored
grammar) is very complex.
Top-Down Parsing
(Grammar is not left
S → bab | bA
Example of Backtracking A → d | cA
factored, backtracking is
required)
( Note: with no symbol lookahead, i.e., LL(0) )

 Let’s follow parsing the input bcd. Expansion Input Action


so far

S bcd Try S –> bab


 As you can see, each time we hit
bab bcd match b
a deadend, we backup to the last
decision point, unmake that bab bcd dead-end, backtrack

decision and try another S bcd Try S –> bA

alternative. bA bcd match b


bA bcd Try A –> d
 If all alternatives have been
bd bcd dead-end, backtrack
exhausted, we back up to the
bA bcd Try A –> cA
preceding decision point and so
bcA bcd match c
on. This continues until we either
bcA bcd Try A –> d
find a working parse or have
bcd bcd match d, Success!
exhaustively tried all combinations
without success.
Top-Down Parsing
(Grammar is not left
S → bab | bA
Example of Backtracking A → d | cA
factored, backtracking is
required)
(Note: with one symbol lookahead, i.e., LL(1)

 Let’s follow parsing the input bcd. Expansion Input Action


so far

S bcd Try S –> bab


 As you can see, each time we hit
bab bcd match b
a deadend, we backup to the last
decision point, unmake that bab bcd dead-end, backtrack

decision and try another S bcd Try S –> bA

alternative. bA bcd match b


bcA bcd Expand A with rule
 If all alternatives have been A→cA by doing one
symbol lookahead on
exhausted, we back up to the
input.
preceding decision point and so
bcA bcd match c
on. This continues until we either
bcA bcd Expand A with rule A→d
find a working parse or have by doing one symbol
exhaustively tried all combinations lookahead on input.
without success. bcd bcd match d, Success!
Top-Down Parsing
(Grammar is not left factored,
 Example of Backtracking backtracking is required)
( Note: with no symbol lookahead, i.e., LL(0) )

 Input: cad Expansion Input Action


so far
 Grammar: S cad Try S –> cAd
cAd cad match c
Top-Down Parsing
 Demonstrate RDP with Backtracking for the given input
and grammar.
▪ Input: w = aaba

▪ Grammar:

(Grammar is not left factored, backtracking is required)


A → abC|aBd|aAD
B → bB | ϵ
C → d | ϵ
D → a | b | ϵ
Top-Down Parsing
 Consider the language defined by grammar S -> aSa | aa , which
ideally accepts L(G) = { a2n, n>=1 }

 Show the working of RDP with backtracking for the following


input strings:

• aa

• aaaa

• aaaaaa

• aaaaaaaa
Top-Down Parsing
 Demonstrate RDP with Backtracking for the given input
and grammar.
 Input: w=read

 Grammar:

S → rXd | rZd (Grammar is not left factored, backtracking is required)

X → oa | ea
Z → ai
Top-Down Parsing
 Example of RDP with Non-Backtracking
 Input: cad

 Grammar(Left factored):
Expansion Input Action
S → cAd so far

S cad Start with S –> cAd


A → aB
cAd cad match c
B → b | ϵ caBd cad Expand A with rule A→aB
caBd cad match a
Note: Even though the Grammar is left factored,
we need one symbol lookahead on the input
cad cad Expand B with rule B→ϵ by
doing one symbol
(when 2 or more rhs for a non-terminal, for
lookahead on input.
example B → b|ϵ) to choose a right production
to eliminate completely backtracking.
cad cad match d, Success!
Top-Down Parsing

First and Follow Functions


Top-Down Parsing
FIRST and FOLLOW

 The construction of both top-down and bottom-up parsing is aided


by two functions, FIRST and FOLLOW, associated with a grammar G.

 During topdown parsing,


FIRST and FOLLOW allow us to choose which production to apply,
based on the next input symbol.

 During panic-mode error recovery,


sets of tokens produced by FOLLOW can be used as synchronizing
tokens.
Top-Down Parsing
FIRST
S → ABCDE
S a b c d e ϵ
A → a | ϵ
FIRST and FOLLOW B → b | ϵ
A
B
a ϵ
b ϵ
C → c | ϵ
C c ϵ
D → d | ϵ
D d ϵ
E → e | ϵ
E e ϵ
Top-Down Parsing
FIRST and FOLLOW

Example: FIRST(S) = FIRST(ABCDE)


.
S → ABCDE .
A → a | ϵ FIRST
. S
B → b | ϵ
A
C → c | ϵ
B
D → d | ϵ C
E → e | ϵ FIRST(E) = FIRST(e) and { ϵ } D
= { e } and { ϵ } = { e ϵ} E
Top-Down Parsing
FIRST and FOLLOW

Example:
FIRST
S → ABCDE S a b c d e ϵ
A → a | ϵ A a ϵ
B → b | ϵ B b ϵ
C → c | ϵ C c ϵ
D → d | ϵ D d ϵ
E → e | ϵ E e ϵ
Top-Down Parsing
FIRST and FOLLOW

❑ Example:
FIRST
E
E'
T
T'
F
Top-Down Parsing
FIRST and FOLLOW

❑ Example:
FIRST
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id
Top-Down Parsing
FIRST and FOLLOW
Top-Down Parsing
FIRST and FOLLOW

FIRST FOLLOW
E ( id $ )
E' + ϵ $ )
T ( id +$)
T' * ϵ +$)
F ( id *+$)
Top-Down Parsing
FIRST and FOLLOW

❑ Example:

Find FIRST and FOLLOW sets of each non-terminal in the grammar.

Note: Є as a FOLLOW doesn’t mean anything (Є is an empty string).


Top-Down Parsing
FIRST and FOLLOW

❑ Example:
FIRST FOLLOW
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Example:
FIRST FOLLOW
E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise -1:

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise -1:

FIRST FOLLOW
S
U
V
W

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise -1:

FIRST FOLLOW
S u y z wx
U u y z ϵ
V w x ϵ
W y z

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise -1:

FIRST FOLLOW
S u y z wx $
U u y z ϵ w xy z
V w x ϵ y z
W y z v $

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 2:

FIRST FOLLOW
S
A
B
C
D
E

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 2:

FIRST FOLLOW
S a b c
A a ϵ
B b ϵ
C c
D d ϵ
E e ϵ

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 2:

FIRST FOLLOW
S a b c $
A a ϵ b c
B b ϵ c
C c d e $
D d ϵ e $
E e ϵ $

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 3:

FIRST FOLLOW
S
B
C

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 3:

FIRST FOLLOW
S a c b d
B a ϵ
C c ϵ

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 3:

FIRST FOLLOW
S a c b d $
B a ϵ b
C c ϵ d

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between

FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.

❑ Exercise - 4:

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between

FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.

❑ Exercise - 4: FIRST FOLLOW


S d g h ϵ b a
A d g h ϵ
B g ϵ
C h ϵ

First(C) = First(h) and {ϵ} = {h} and {ϵ} = {h ϵ}


First(B) = First(g) and {ϵ} = {g} and {ϵ} = {g ϵ}
First(A) = First(da) and First(BC) = {d} and {g h ϵ} = {d g h ϵ}
First(S) = First(ACB) and First(CbB) and First(Ba)
= {d g h ϵ} and {h b} and {g a}
= {d g h ϵ b a}
Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between

FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.

❑ Exercise - 4: FIRST FOLLOW


S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh

Follow(S) = { $ } If S is the start symbol place $ in Follow(S)


Follow(A) = Non-epsilon symbols of First(CB) and if First(CB)
contains epsilon then Follow(A) = Follow(S) also.
Follow(A) = { h g } and { $ }
Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between

FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.

❑ Exercise - 4: FIRST FOLLOW


S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh

Follow(S) = { $ } If S is the start symbol place $ in Follow(S)


Follow(A) = {h g $}
Follow(B) = Follow(S) and First(a) and First(C) and if First(C)
contains epsilon then Follow(B) = Follow(A) also.
Follow(B) = { $ } and { a } and { h g $ } = { $ a h g }
Top-Down Parsing
Note: FIRST help us to pick a rule when we have a choice between

FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.

❑ Exercise - 4: FIRST FOLLOW


S d g h ϵ b a $
A d g h ϵ hg$
B g ϵ $ahg
C h ϵ g$bh

Follow(S) = { $ } If S is the start symbol place $ in Follow(S)


Follow(A) = {h g $}
Follow(B) = { $ a h g }
Follow(C) = { g $ } and { b } and { h g $ } = { g $ b h }
Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 5:

S -> L=R | R FIRST FOLLOW


R -> L S
L -> *R | id R
L

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
FIRST and FOLLOW

❑ Exercise - 5:

S -> L=R | R FIRST FOLLOW


R -> L S * id $
L -> *R | id R * id = $
L * id = $

Find FIRST and FOLLOW sets of each non-terminal in the grammar.


Top-Down Parsing
Why FIRST and FOLLOW?

❑ FIRST and FOLLOW help us to pick a rule when we have a choice


between two or more r.h.s. by predicting the first symbol that each
r.h.s. can derive.

❑ Even if there is only one r.h.s. we can still use them to tell us
whether or not we have an error - if the current input symbol
cannot be derived from the only r.h.s. available, then we know
immediately that the sentence does not belong to the grammar,
without having to (attempt to) finish the parse.
Top-Down Parsing
Why FIRST in Compiler Design?

If the compiler would have come to know in advance, that what is the “first
character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can wisely
take decision on which production rule to apply.

S -> cAd A -> bc|a Input: cad

Thus, in the example above, if it knew that after reading character ‘c’ in the input
string and applying S->cAd, next character in the input string is ‘a’, then it would
have ignored the production rule A->bc (because ‘b’ is the first character of the
string produced by this production rule, not ‘a’ ), and directly use the production
rule A->a
Top-Down Parsing

LL(1) Grammar & Predictive


Parsing
Top-Down Parsing
LL(1) Grammars

 Predictive parsers, can be constructed for a class of grammars called LL(1).

 In LL(1):

▪ the first "L“ stands for scanning the input from left to right,

▪ the second "L" for producing a leftmost derivation, and

▪ the "1" for using one input symbol of lookahead at each step to make parsing
action decisions.

 The class of LL(1) grammars is rich enough to cover most programming


constructs, although care is needed in writing a suitable grammar for the
source language.
For example, no left-recursive or ambiguous grammar can be LL(1).
Top-Down Parsing
LL(1) Grammars

 A grammar G is LL(1) if and only if whenever A → F|T are two distinct


productions of G, the following conditions hold:

1. FIRST(F) and FIRST(T) are disjoint

2. if ϵ is in FIRST(T) then FIRST(F) and FOLLOW(A) should be disjoint and


if ϵ is in FIRST(F) then FIRST(T) and FOLLOW(A) should be disjoint.

 Justify whether the grammars are LL(1) or not


➢ G1: G2: G3:

S → Aa S → Aa S → A|xb

A →bAb|ϵ A →Abb|ϵ A →aAb|B

B →x
Top-Down Parsing
LL(1) Grammars

 Justify whether the grammars are LL(1) or not

➢ G1: G2: G3:

S → Aa S → Aa S → A|xb

A →bAb|ϵ A →Abb|ϵ A →aAb|B

B →x

G1: is not LL(1) First(bAb)={b} Follow(A)={a b}

G2: is not LL(1) First(Abb)={b} Follow(A)={a b}

G3: is not LL(1) First(A)=First(aAb) and First(B) First(xb)={x}


First(A)={a} and {x} = {a x}
Top-Down Parsing
LL(1) Grammars

 If no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, then grammar is


LL(1).

 An example of a FIRST/FIRST conflict:

o S → Xb | Yc

o X → a

o Y → a

❑ By seeing only the first input symbol a, you cannot know whether to apply the
production S → Xb or S → Yc, because a is in the FIRST set of both X and Y.

First(Xb) = {a} FIRST(Xb) and FIRST(Yc) are not


First(Yc) = {a} disjoint, the grammar is not LL(1).
Top-Down Parsing
LL(1) Grammars

❑ An example of a FIRST/FOLLOW conflict:


o S → AB
o A → fe | ϵ
o B → fg

❑ By seeing only the first input symbol f, you cannot decide whether to
apply the production A → fe or A → ϵ, because f is in both the FIRST set
of A and the FOLLOW set of A (A can be parsed as epsilon and B as f).

❑ Notice that if you have no epsilon-productions you cannot have a


FIRST/FOLLOW conflict.
Top-Down Parsing
LL(1) Grammars

 Explain why the following grammar is LL(1)

X → YaYb|ZbZa

Y →ϵ

Z →ϵ

 Explain why the following grammar is not LL(1)

S → ABA

A →aA|ϵ

B →b|ϵ
Top-Down Parsing
LL(1) Grammars

 Explain why the following grammar is LL(1)

X → YaYb|ZbZa
First(YaYb) = {a} FIRST(YaYb) and
Y →ϵ First(ZbZa) = {b} FIRST(ZbZa) are disjoint,

Z →ϵ the grammar is LL(1).

 Explain why the following grammar is not LL(1)

S → ABA First(aA) = {a}


Follow(A)= Follow(S) and First(BA)
A →aA|ϵ = { $ } and { b a } = { $ b a}
First(aA) and Follow(A) are not disjoint, the grammar
B →b|ϵ is not LL(1).
Top-Down Parsing
LL(1) Grammars

 Predictive parsers can be constructed for LL(1) grammars since the proper
production to apply for a nonterminal can be selected by looking only at the
current input symbol.

 Flow-of-control constructs, with their distinguishing keywords, generally satisfy


the LL(1) constraints. For instance, if we have the productions

then the keywords if, while, and the symbol { tell us which alternative is the only
one that could possibly succeed if we are to find a statement.
Top-Down Parsing
LL(1) Grammars

 The class of grammars for which we can construct predictive parsers


looking k-symbols ahead in the input is sometimes called the LL(k) class.

 The LL(1) class uses FIRST and FOLLOW computations.

▪ From the FIRST and FOLLOW sets for a grammar, we shall construct
Predictive parsing tables.

▪ Predictive parsing tables make the explicit choice of production during


top-down parsing.

 FIRST and FOLLOW sets are also useful during bottom-up parsing.
Top-Down Parsing
LL(1) Grammars

 The class of grammars for which we can construct predictive parsers


looking k-symbols ahead in the input is sometimes called the LL(k) class.

 Example:

S → x | xy | xyz

The above grammar is LL(1) or LL(2) or LL(3) or LL(4).


Top-Down Parsing
LL(1) Grammars

 The class of grammars for which we can construct predictive parsers


looking k-symbols ahead in the input is sometimes called the LL(k) class.

 Example:

S → x | xy | xyz

The above grammar is LL(1) or LL(2) or LL(3) or LL(4).


Top-Down Parsing

Note: This algorithm collects the information from FIRST and FOLLOW sets into a predictive
parsing table M[A,a], a two-dimensional array, where A is a nonterminal, and a is a terminal
or the symbol $, the input end marker.
Top-Down Parsing
FIRST FOLLOW
 Example 1: E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$

Figure 4.17: Parsing table M


Top-Down Parsing
FIRST FOLLOW
 Example 1: E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$

Error Error Error Error


Error Error Error
Error Error Error Error
Error Error
Error Error Error Error
Figure 4.17: Parsing table M
Top-Down Parsing
 Input : id + id * id$
Top-Down Parsing
 Example 2: Construct parsing table for the following grammar
Top-Down Parsing
 Example 2: Construct parsing table for the following grammar

FIRST FOLLOW
S ia e$
S' eϵ e$
E b t

Figure 4.18: Parsing table M


Top-Down Parsing
 Example 2:

The parsing table in Fig. 4.18. The entry for M[S',e] contains both
S' → eS and S' → ϵ. The grammar is ambiguous.

Figure 4.18: Parsing table M


Top-Down Parsing
 Example 3: Construct parsing table for the following grammar
FIRST FOLLOW
S → ABCDE S a b cdeϵ $
A → a | ϵ A a ϵ bcde$
B → b | ϵ
B b ϵ cde$
C → c | ϵ
D → d | ϵ C c ϵ de$
E → e | ϵ D d ϵ e$
E e ϵ $

a b c d e $
S
A
B
C
D
E
Top-Down Parsing
 Example 3: Construct parsing table for the following grammar
FIRST FOLLOW
S → ABCDE S a b cdeϵ $
A → a | ϵ A a ϵ bcde$
B → b | ϵ
B b ϵ cde$
C → c | ϵ
D → d | ϵ C c ϵ de$
E → e | ϵ D d ϵ e$
E e ϵ $

a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C C → c C → ϵ C → ϵ C → ϵ
D D → d D → ϵ D → ϵ
E E → e E → ϵ
Top-Down Parsing
Exercise

 For the following productions:

S → +SS | * SS | a
 Write predictive parsing table

 Write predictive parser

 Show how to parse: +*aaa

FIRST FOLLOW + * a $
S S
Top-Down Parsing
Exercise

 For the following productions:

S → +SS | * SS | a
 Write predictive parsing table

 Write predictive parser

 Show how to parse: +*aaa

FIRST FOLLOW + * a $
S +*a +*a$ S S → +SS S → *SS S → a
Top-Down Parsing
Exercise

 The following grammar is not LL(1)

S → Aa

A → bA|B

B → Cc

C → bC|ϵ

It is possible to drop exactly one production from this grammar to obtain


a new grammar generating the same language. Identify that production
and prove that the resulting grammar is LL(1).
Top-Down Parsing
Nonrecursive Predictive Parsing

 A nonrecursive predictive parser can be built by maintaining a stack


explicitly, rather than implicitly via recursive calls.
The predictive parser mimics a leftmost derivation.

 If w is the input that has been matched so far, then the stack holds α
sequence of grammar symbols a such that

 The table-driven predictive parser in Fig. 4.19 has


1. an input buffer,
2. a stack containing a sequence of
grammar symbols,
3. a parsing table and
4. an output stream.
Top-Down Parsing

M:

(i.e., not matching with the input symbol)

(i.e., the parsing-table entry is empty)


Top-Down Parsing

(i.e., not matching with the input symbol)

(i.e., the parsing-table entry is empty)


Top-Down Parsing

(i.e., not matching with the input symbol)

(i.e., the parsing-table entry is empty)


Top-Down Parsing
 Example: On input id + id * id, the nonrecursive predictive parser of Algorithm
4.34 makes the sequence of moves in Fig. 4.21. These moves correspond to a
leftmost derivation:

LL(1) table/Predictive parsing table


Top-Down Parsing
Problem 1:

 For the following productions:

S → +SS | * SS | a
 Write predictive parsing table

 Write predictive parser

 Show how to parse: +*aaa


Top-Down Parsing
Problem 1: S → +SS | * SS | a + * a $
FIRST FOLLOW S
S
Matched Stack Input Action
S$ +*aaa$
Top-Down Parsing
Problem 1: S → +SS | * SS | a + * a $
FIRST FOLLOW S S→+SS S→*SS S→a
S +*a +*a$
Matched Stack Input Action
S$ +*aaa$
+SS$ +*aaa$ Output: S→ +SS
+ SS$ *aaa$ Match +
+ *SSS$ *aaa$ Output: S→ *SS
+* SSS$ aaa$ Match *
+* aSS$ aaa$ Output: S→ a
+*a SS$ aa$ Match a
+*a aS$ aa$ Output: S→ a
+*aa S$ a$ Match a
+*aa a$ a$ Output: S→ a
+*aaa $ $ Match a and Accept
Top-Down Parsing
Problem 2:

 For the following grammar answer the following :

a) Eliminate Left recursion

b) Left factor the grammar

c) Construct first and follow table.

d) Construct LL(1) table.

e) Mention whether the grammar is in LL(1) or not.

f) If the grammar is in LL(1), parse the string: (a,a)

S → a | ^ | (L)
L → L,S | S
Recursive grammar
S → a | ^ | (L) LL(1) table/Predictive parsing table
FIRST FOLLOW
L → L,S | S a ^ ( ) , $
S a ^ ( $ , )
S S → a S → ^ S → (L)
Non-recursive grammar L a ^ ( )
S → a | ^ | (L) L L → SA L → SA L → SA
A , ϵ )
L → SA
A → ,SA | ϵ A A → ϵ A → ,SA

Matched Stack Input Action


S$ (a,a)$
Recursive grammar
S → a | ^ | (L) LL(1) table/Predictive parsing table
FIRST FOLLOW
L → L,S | S a ^ ( ) , $
S a ^ ( $ , )
S S → a S → ^ S → (L)
Non-recursive grammar L a ^ ( )
S → a | ^ | (L) L L → SA L → SA L → SA
A , ϵ )
L → SA
A → ,SA | ϵ A A → ϵ A → ,SA

Matched Stack Input Action


S$ (a,a)$
(L)$ (a,a)$ Output: S→ (L)
( L)$ a,a)$ Match (
( SA)$ a,a)$ Output: L→ SA
( aA)$ a,a)$ Output: S→ a
(a A)$ ,a)$ Match a
(a ,SA)$ ,a)$ Output: A → ,SA
(a, SA)$ a)$ Match ,
(a, aA)$ a)$ Output: S→ a
(a,a A)$ )$ Match a
(a,a )$ )$ Output: A→ ϵ
(a,a) $ $ Match ) and Accept
Top-Down Parsing
Problem 3:

 For the following grammar answer the following :

a) Eliminate Left recursion

b) Left factor the grammar

c) Construct first and follow table.

d) Construct LL(1) table.

e) Mention whether the grammar is in LL(1) or not.

f) If the grammar is in LL(1), parse the string: ba

S → AaAb | BbBa
A → λ
B → λ
Top-Down Parsing
Problem 4:

 For the following grammar answer the following :

a) Eliminate Left recursion

b) Left factor the grammar

c) Construct first and follow table.

d) Construct LL(1) table.

e) Mention whether the grammar is in LL(1) or not.

f) If the grammar is in LL(1), parse the strings: λ and b

S → AB
A → a | λ
B → b | λ

Note: λ is epsilon(i.e., null character)


Top-Down Parsing
Problem 5:

 For the following grammar answer the following :

a) Eliminate Left recursion

b) Left factor the grammar

c) Construct first and follow table.

d) Construct LL(1) table.

e) Mention whether the grammar is in LL(1) or not.

f) If the grammar is in LL(1), parse the strings: abce, cde and empty string

S → ABCDE
A → a | λ
B → b | λ
C → c | λ
D → d | λ
E → e | λ
a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
S → ABCDE FIRST FOLLOW
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
A → a | ϵ S a b cdeϵ $
B → b | ϵ A a ϵ bcde$ B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C → c | ϵ B b ϵ cde$ C C → c C → ϵ C → ϵ C → ϵ
D → d | ϵ C c ϵ de$
D D → d D → ϵ D → ϵ
E → e | ϵ D d ϵ e$
E e ϵ $ E E → e E → ϵ

Matched Stack Input Action


S$ abce$
ABCDE$ abce$ Output: S→ ABCDE
aBCDE$ abce$ Output: A→a
a BCDE$ bce$ Match a
a bCDE$ bce$ Output: B→b
ab CDE$ ce$ Match b
ab cDE$ ce$ Output: C→c
abc DE$ e$ Match c
abc E$ e$ Output: D→ϵ
abc e$ e$ Output: E→e
abce $ $ Match e and Accept
a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
S → ABCDE FIRST FOLLOW
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
A → a | ϵ S a b cdeϵ $
B → b | ϵ A a ϵ bcde$ B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C → c | ϵ B b ϵ cde$ C C → c C → ϵ C → ϵ C → ϵ
D → d | ϵ C c ϵ de$
D D → d D → ϵ D → ϵ
E → e | ϵ D d ϵ e$
E e ϵ $ E E → e E → ϵ

Input: Empty String/Null String

Matched Stack Input Action


S$ $
ABCDE$ $ Output: S→ ABCDE
BCDE$ $ Output: A→ ϵ
CDE$ $ Output: B→ ϵ
DE$ $ Output: C→ ϵ
E$ $ Output: D→ ϵ
$ $ Output: E→ ϵ and Accept
Top-Down Parsing

Error Recovery in Predictive Parsing

 An error is detected during predictive parsing when

1. the terminal on top of the stack does not match the next input symbol or

2. nonterminal A is on top of the stack, a is the next input symbol, and


M[A,a] is error (i.e., the parsing-table entry is empty)

Fig 4.20: Predictive Parsing Algorithm

(i.e., not matching with the input symbol)

(i.e., the parsing-table entry is empty)

.....
Top-Down Parsing
Error Recovery in Predictive Parsing

 We would like our parser to be able to recover from an error and


continue parsing.

1. Panic mode recovery

▪ Modify the stack and/or the input string to try and reach state
from which we can continue.

2. Pharse-level recovery

▪ We associate each empty slot with an error handling procedure.


Error Recovery in Predictive Parsing
Panic mode recovery

 Idea:
➢ Decide on a set of synchronizing tokens.

➢ When an error is found and there's a nonterminal at the top of the stack,

• discard input tokens until a synchronizing token is found.

• Synchronizing tokens are chosen so that the parser can recover


quickly after one is found, e.g. a semicolon when parsing statements.

➢ When an error is found and there is a terminal at the top of the stack,

• we could try popping it to see whether we can continue. Assume that


the input string is actually missing that terminal.
Error Recovery in Predictive Parsing

Panic mode recovery

 Possible synchronizing tokens for a nonterminal A


➢ the tokens in FOLLOW(A)

▪ When one is found, pop A of the stack and try to continue

➢ the tokens in FIRST(A)

▪ When one is found, match it and try to continue

➢ tokens such as semicolons that terminate statements


Error Recovery in Predictive Parsing
Panic mode recovery

 HOW TO SELECT SYNCHRONIZING SET?


➢ Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A.
If we skip input symbols until an element of FOLLOW(A) is seen in input and pop
A from the stack, then it is likely that parsing can continue.

➢ Place all symbols in FIRST(A) to the synchronizing set for nonterminal A.


If a symbol in FIRST(A) appears in the input, then it may be possible to resume
parsing according to A.

➢ We might add keywords that begins statements to the synchronizing sets for the
nonterminals generating expressions.

➢ We can add to the synchronizing set of a lower-level construct the symbols that
begin higher-level constructs.
Error Recovery in Predictive Parsing

Panic mode recovery

 HOW TO SELECT SYNCHRONIZING SET? Cont..

➢ If a terminal on top of stack cannot be matched, a simple idea is to


pop the terminal, issue a message saying that the terminal was inserted
and continue parsing.

➢ If a nonterminal can generate the empty string, then the production


deriving ϵ can be used as a default.
This may postpone some error detection, but cannot cause an error to
be missed. This approach reduces the number of nonterminals that have
to be considered during error recovery.
Error Recovery in Predictive Parsing
FIRST FOLLOW
Panic mode recovery E
E'
( id
+ ϵ
)$
)$
T ( id +)$
T' * ϵ +)$
 Example: The table in Fig. 4.22 is to be used as follows. F ( id *+)$

▪ If the parser looks up entry M[A, a] and finds


▪ the entry is blank, then the input symbol a is skipped.

▪ the entry is "synch," then the nonterminal on top of the stack is popped in an
attempt to resume parsing.

▪ If a token on top of the stack does not match the input symbol, then we pop
the token from the stack, as mentioned above.

“synch” indicating synchronizing


tokens obtained from FOLLOW set of
the nonterminal in question.
Error Recovery in Predictive Parsing
FIRST FOLLOW
E ( id )$
E' + ϵ )$
Panic mode recovery T
T'
( id
* ϵ
+)$
+)$
F ( id *+)$

 On the erroneous input + id * +id, the parser and error recovery mechanism
of Fig. 4.22 behave as in Fig. 4.23.

▪If the parser looks up entry M[A, a] and finds


▪the entry is blank, then the input symbol a
is skipped.
▪the entry is "synch," then the nonterminal
on top of the stack is popped in an attempt
to resume parsing.
Error Recovery in Predictive Parsing

Pharse level error recovery


 Each unfilled cell in the table can be filled with a special-purpose
error routine.

 Error routines typically remove tokens from the input, and/or pop an
item from the stack.

 It is ill-advised to modify the input stream or the stack without


removing items, because it is then hard to guarantee that error
recovery will always terminate.
Parsing

Bottom-Up Parsing
Bottom-Up Parsing

Automaton (DFA) is used to make


Nonbacktracking parsing decisions in LR parsers.
shift-reduce parsers
Table driven

Least Powerful Most Powerful

A bottom-up parse corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top).
Bottom-Up Parsing
 A general style of bottom-up parsing is known as shift-reduce parsing.

 The LR grammars - the largest class of grammars for which shift-reduce


parsers can be built.
Bottom-Up Parsing

 It is too much work to build an LR parser by hand, automatic parser


generators tools make it easy to construct efficient LR parsers from
suitable grammars.

 Parser Generators Tools:


▪ Antlr is a widely-used parser generator for Java, and other languages.

▪ Yacc/Bison – Parser generator

Bison(part of GNU Project) reads a specification of a context-free


language, warns about any parsing ambiguities, and generates a
parser (either in C, C++, or Java) which reads sequences of tokens and
decides whether the sequence conforms to the syntax specified by the
grammar.
Bottom-Up Parsing
 Given G: and the input string: id * id

Right-most derivation:

Figure 4.25: A bottom-up parse for id * id


Bottom-Up Parsing
 Given G: and the input string: id * id

Right-most derivation:

Figure 4.25: A bottom-up parse for id * id

Figure above illustrate a sequence of reductions.


Bottom-Up Parsing

Reductions

 A bottom-up parsing is the process of "reducing" a string w to the start


symbol of the grammar.

At each reduction step, a specific substring matching the body of a


production is replaced by the nonterminal at the head of that
production.

 The key decisions during bottom-up parsing are about when to reduce
and about what production to apply, as the parse proceeds.
Bottom-Up Parsing
Reductions
 Example: The reductions will be discussed in terms of the sequence of strings

id * id, F * id, T * id, T * F, T, E


➢ The strings in this sequence are formed from the roots of all the subtrees in the
snapshots.

➢ The sequence starts with the input string id*id.

➢ The first reduction produces F * id by reducing the leftmost id to F, using the


production F → id.

➢ The second reduction produces T * id by reducing F to T.

➢ Now, we have a choice between reducing the string T, which is the body of E →
T, and the string consisting of the second id, which is the body of F → id. Rather
than reduce T to E, the second id is reduced to T, resulting in the string T * F. This
string then reduces to T. The parse completes with the reduction of T to the start
symbol E.
Bottom-Up Parsing

Reductions

 By definition, a reduction is the reverse of a step in a derivation (recall


that in a derivation, a nonterminal in a sentential form is replaced by the body of one of its productions).

 The goal of bottom-up parsing is therefore to construct a derivation


in reverse.

Right-most derivation:

Figure 4.25: A bottom-up parse for id * id

---------------------------------------→
The rightmost derivation in reverse
Bottom-Up Parsing
Handle Pruning

 Bottom-up parsing during a left-to-right scan of the input constructs a


rightmost derivation in reverse.

 A "handle" is a substring that matches the body of a production, and


whose reduction represents one step along the reverse of a rightmost
derivation.

 Handle Pruning: replace handle by corresponding LHS of a production.


Bottom-Up Parsing
Handle Pruning
 Handle Pruning: replace handle by corresponding LHS.
Bottom-Up Parsing

Shift-Reduce Parsing
Bottom-Up Parsing: Shift-Reduce Parsing

Implementing Shift-Reduce Parsers

➢ In Shift-reduce parsing
▪ A Stack holds grammar symbols.

▪ An input buffer holds the rest of the string to be parsed.

▪ The handle always appears at the top of the stack just before it is
identified as the handle.

▪ $ - mark the bottom of the stack and also the right end of the input.

➢ Shift-reduce parsing - Demo


➢ https://silcnitc.github.io/yacc.html
Bottom-Up Parsing: Shift-Reduce Parsing

 Initially, the stack is empty, and the string w is on the input, as


follows:

 Parse is successful if stack contains only the start symbol when the
input stream ends.
Bottom-Up Parsing: Shift-Reduce Parsing

Four possible actions a shift-reduce parser can make:

1. Shift. Shift the next input symbol onto the top of the stack.

2. Reduce. The right end of the string to be reduced must be at the


top of the stack. Locate the left end of the string within the stack
and decide with what nonterminal to replace the string.

3. Accept. Announce successful completion of parsing.

4. Error. Discover a syntax error and call an error recovery routine.

The use of a stack in shift-reduce parsing is justified by an important fact:


the handle will always eventually appear on top of the stack, never inside.
Bottom-Up Parsing: Shift-Reduce Parsing
 Shift-reduce parser actions in parsing the input string id1 * id2
1. Shift. Shift the next input symbol onto the top of the stack.
2. Reduce. The right end of the string to be reduced must be at the top
of the stack. Locate the left end of the string within the stack and
decide with what nonterminal to replace the string.
3. Accept. Announce successful completion of parsing.
Bottom-Up Parsing: Shift-Reduce Parsing
 For the following grammar, parse the string a+++a++ using general style of
bottom up parsing (i.e., shift reduce parsing).
S → A
A → A+A | B++
B → a
Bottom-Up Parsing: Shift-Reduce Parsing
 For the following grammar, parse the string a+++a++ using general style of
bottom up parsing (i.e., shift reduce parsing).
S → A
STACK INPUT ACTION
A → A+A | B++
$ a+++a++$ shift B → a
$a +++a++$ Reduce by B → a
$B +++a++$ shift
$B+ ++a++$ shift
$B++ +a++$ Reduce by A → B++
$A +a++$ shift
$A+ a++$ shift
$A+a ++$ Reduce by B → a
$A+B ++$ shift
$A+B+ +$ shift
$A+B++ $ Reduce by A → B++
$A+A $ Reduce by A → A+A
$A $ Reduce by S → A
Bottom-Up Parsing: Shift-Reduce Parsing

Conflicts During Shift-Reduce Parsing

 There are (ambiguous)CFGs for which shift-reduce parsing cannot be used.

▪ Every shift-reduce parser for such a grammar can reach a configuration


in which the parser, knowing the entire stack contents and the next input
symbol,

▪ Cannot decide whether to shift or to reduce (a shift/reduce conflict),or

▪ Cannot decide which of several reductions to make (a reduce/reduce


conflict).

▪ Technically, these CFGs are not in the LR(k) class of grammars; we refer
to them as non-LR grammars.
Bottom-Up Parsing: Shift-Reduce Parsing

Conflicts During Shift-Reduce Parsing

 Note:

 Ambiguous CFGs and getting conflicts during parsing,


then they are not under LL(K) and LR(K) class of
grammars.

 Unambiguous CFGs and getting conflicts during


parsing, then the Parser is not capable of handling
those CFGs.
Bottom-Up Parsing: Shift-Reduce Parsing

Conflicts During Shift-Reduce Parsing


Bottom-Up Parsing

LR Parsing
LR Parsing

 The most prevalent type of bottom-up parser today is based


on a concept called LR(k) parsing;
▪ the "L" is for left-to-right scanning of the input,

▪ the "R" for constructing a rightmost derivation in reverse, and

▪ the k for the number of input symbols of lookahead that are used
in making parsing decisions.

 The cases k = 0 or k = 1 are of practical interest, and we only


consider LR parsers with k <= 1 here. When (k) is omitted, k is
assumed to be 1.
LR Parsing

 This section introduces the basic concepts of LR parsing


and the easiest method for constructing shift-reduce
parsers, called "simple LR" (or SLR).

 LR parsers are table-driven.


LR Parsing
Why LR Parsers?
 LR parsing is attractive for a variety of reasons:
1. LR parsers can be constructed to recognize virtually all programming
language constructs for which context-free grammars can be written.

2. The LR-parsing method is the most general nonbacktracking shift-reduce


parsing method known, yet it can be implemented as efficiently as
other, more primitive shift-reduce methods.

3. An LR parser can detect a syntactic error as soon as it is possible to do


so on a left-to-right scan of the input.
LR Parsing
Why LR Parsers?

 LR parsing is attractive for a variety of reasons:


4. The class of grammars that can be parsed using LR methods is a proper
superset of the class of grammars that can be parsed with predictive or
LL methods.

Thus, LR grammars can describe more languages than LL grammars.


LR Parsing
Items and the LR(0) Automaton
 How does a shift-reduce parser know when to shift and when to reduce?

For example, with stack contents $T and next input symbol * in Fig. 4.28,
how does the parser know that T on the top of the stack is not a handle, so
the appropriate action is to shift and not to reduce T to E?
LR Parsing
Items and the LR(0) Automaton
 How does a shift-reduce LR-parser know when to shift and when to reduce?
▪ An LR parser makes shift-reduce decisions by using automaton
(maintaining states) to keep track of where we are in a parse.

▪ States represent sets of "items“.

DFA is used to make parsing


decisions in LR parsers.
LR Parsing
Items and the LR(0) Automaton

 What is an LR(0) item?

▪ An LR(0) item is a production with a dot at


some position of the body. Thus,
production A →XYZ yields the four items

▪ A → •XYZ

▪ A → X•YZ

▪ A → XY•Z

▪ A → XYZ•

▪ The production A → ϵ generates only one


item, A → •
LR Parsing
Items and the LR(0) Automaton

 An item indicates how much of a production body we have seen at a given


point in the parsing process.

 For example
➢ Item A → •XYZ indicates that we hope to see next an input string derivable from
XYZ.

➢ Item A → X•YZ indicates that we have just seen on the input a string derivable
from X and that we hope next to see a string derivable from YZ.

➢ Item A → XYZ• indicates that we have seen a string derivable from XYZ on the
input and that it may be time to reduce XYZ to A.

 One collection of sets of LR(0) items, called the canonical LR(0) collection,
provides the basis for constructing a DFA that is used to make parsing
decisions. Such an automaton is called an LR(0) automaton.
LR Parsing

Items and the LR(0) Automaton

 Each state of the LR(0)


automaton represents a set of
items in the canonical LR(0)
collection.
LR Parsing
Items and the LR(0) Automaton

 To construct the Canonical LR(0) collection (or LR(0) Automaton) for


a grammar, we define

▪ An Augmented grammar and

▪ two functions, CLOSURE and GOTO.


LR Parsing
Augmented grammar

▪ If G is a grammar with start symbol S, then G', the augmented grammar


for G, is G with a new start symbol S' and production S' →S.

G' = G + { new start symbol S' and production S' →S}

▪ The purpose of this new starting production is to indicate to the parser


when it should stop parsing and announce acceptance of the input.
That is, acceptance occurs only when the parser is about to reduce by
S' → S

▪ Example:
G’ :
G:
LR Parsing
Closure of Item Sets

➢ If I is a set of items for a grammar G, then CLOSURE(I) is the set of items


constructed from I by the two rules:

1. Initially, add every item in I to CLOSURE(I).

2. If A → α•Bβ is in CLOSURE(I) and B → γ is a production, then add the


item B →•γ to CLOSURE(I), if it is not already there.
Apply this rule until no more new items can be added to CLOSURE(I).

 Examples: Find CLOSURE(I6)


+
(

CLOSURE(I4) Find CLOSURE(I2)


*
LR Parsing
Closure of Item Sets cont…

 Intuitively, A → α•Bβ in CLOSURE(I) indicates that


▪ At some point in the parsing process, we think we might next see a
substring derivable from Bβ as input.

▪ The substring derivable from Bβ will have a prefix derivable from B by


applying one of the B-productions.

We therefore add items for all the B-productions; that is, if B →γ is a

production, we also include B →•γ in CLOSURE(I).

 Examples:
LR Parsing
Closure of Item Sets

 Closure algorithm
LR Parsing
Closure of Item Sets

 Example 4.40 : Construct the Closure of Item Sets for the following
augmented grammar
LR Parsing
Closure of Item Sets
LR Parsing
The Function GOTO
▪ The second useful function is GOTO(I,X) where I is a set of items and X is a
grammar symbol.

▪ The GOTO function is used to define the transitions in the LR(0) automaton for
a grammar.

▪ GOTO(I,X) is defined to be the closure of the set of all items [A → αX•β] such
that [A → α•Xβ] is in I.

Example: GOTO(I1,+)

▪ The states of the automaton correspond to sets of items, and GOTO(I,X)


specifies the transition from the state for I under input X.
LR Parsing
The Function GOTO

1
1
LR Parsing
 Algorithm to construct C, the canonical collection of sets of LR(0)
items for an augmented grammar G'
LR Parsing

 Exercise 1: Construct the LR(0) automaton for the following grammar.


LR Parsing
 Exercise 1: Construct the LR(0) automaton for the following grammar.

E' → E

▪Nonkernel items
▪no need to be stored
▪dot at far left
LR Parsing
 Exercise 2: Construct the LR(0) automaton for the following grammar.
FIRST FOLLOW

S
LR Parsing
 Exercise 3: Construct the LR(0) automaton for the following grammar.
LR Parsing
The LR-Parsing

 LR parser consists of
1. an input,

2. an output,

3. a stack,

4. a driver program, and

5. a parsing table that has two parts (ACTION and GOTO).

 The parsing program reads characters from an input buffer one at a time.
Where a shift-reduce parser would shift a symbol, an LR parser shifts a state.
Each state summarizes the information contained in the stack below it.

 The LR Parser stack holds a sequence of states, s0s1 . . . sm where sm is on top.


LR Parsing

Simple-LR Parsing/SLR Parsing/


SLR(1) Parsing
SLR Parser
Constructing SLR-Parsing Tables

 The SLR method for constructing parsing tables is a good starting


point for studying LR parsing.

 We shall refer
▪ to the parsing table constructed by SLR method as an SLR table,
and

▪ to an LR parser using an SLR-parsing table as an SLR parser.

 The SLR method begins with LR(0) items and LR(0) automata.
That is, given a grammar, G, we augment G to produce G', with a new start symbol
S'. From G', we construct C, the canonical collection of sets of items for G' together
with the GOTO function.
SLR Parser
E' → E
SLR Parser
Constructing SLR-Parsing Tables
SLR Parser
Constructing SLR-Parsing Tables

1. ...

2. ...
SLR Parser
Constructing SLR-Parsing Tables

 The parsing table consisting of the ACTION and GOTO functions


determined by Algorithm 4.46 is called the SLR(1) table for G.

 An LR parser using the SLR(1) table for G is called the SLR(1) parser for G,
and a grammar having an SLR(1) parsing table is said to be SLR(1).

 We usually omit the "(1)" after the "SLR," since we shall not deal here
with parsers having more than one symbol of lookahead.

 In SLR parsers, the lookahead sets are determined directly from the
grammar, without considering the individual states and transitions.
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
s5 state number r3 Production number
Constructing SLR-Parsing Table
shift reduce

 Example 4.47 : Let us construct the SLR table for the augmented expression grammar.
FOLLOW
E +)$
T +*)$
F +*)$

Means shift and move to state 4

Means reduce using production number 1


SLR Parser
 Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar.

S -> SA
S -> A
A -> a

▪ Construct the SLR parsing table for this grammar.

▪ Is the grammar SLR?


LR-Parsing Algorithm (Method-2)
LR-parsing algorithm. (Method-2)
INPUT: An input string w and an LR-parsing table with functions ACTION and GOTO for a grammar G.
OUTPUT: If w is in L(G), the reduction steps of a bottom-up parse for w; otherwise, an error indication.
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer.
let a be the first symbol of w$;

while(1) {
let s be the state on top of the stack;
if (ACTION[s, a] = shift t) {
push a and then t onto the stack;
let a be the next input symbol;
} else if (ACTION[s, a] = reduce A → β) {
pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and then GOTO[t, A] onto the stack;
} else if (ACTION[s, a] = accept) break; /* parsing is done */
else call error-recovery routine;
SLR Parsing (Method-2) Input: id+id$
Stack Input Action Fig: SLR Parsing table
buffer
$0 id+id$

LR Parsing Algorithm (Method-2)


let a be the first symbol of w$;
while(1) {
let s be the state on top of the stack;
if (ACTION[s, a] = shift t) {
push a and then t onto the stack;
let a be the next input symbol;
} else if (ACTION[s, a] = reduce A → β) {
pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and then GOTO[t, A] onto the
stack;
} else if (ACTION[s, a] = accept) break;
/* parsing is done */
else call error-recovery routine;
}
SLR Parsing (Method-2) Input: id+id$
Stack Input Action Fig: SLR Parsing table
buffer
$0 id+id$ ACTION[0,id] = s5, Push current input symbol id
and then state number 5 onto the stack
$ 0 id 5 +id$ ACTION[5,+] = r6 (F → id), Pop 2*|id| symbols
from the stack, Push F and then GOTO[0,F]=3 onto
the stack.
$0F3 +id$ ACTION[3,+] = r4 (T → F), Pop 2*|F| symbols from
the stack, Push T and then GOTO[0,T]=2 onto the
stack.
$0T2 +id$ ACTION[2,+] = r2 (E → T), Pop 2*|T| symbols from
the stack, Push E and then GOTO[0,E]=1 onto the
stack. LR Parsing Algorithm (Method-2)
$0E1 +id$ ACTION[1,+] = s6, Push + and then 6 onto the let a be the first symbol of w$;
stack while(1) {
$0E1+6 id$ ACTION[6,id] = s5, Push id and then 5 onto the let s be the state on top of the stack;
stack if (ACTION[s, a] = shift t) {
$ 0 E 1 + 6 id 5 $ ACTION[5,$] = r6 (F → id), Pop 2*|id| symbols push a and then t onto the stack;
from the stack, Push F and then GOTO[6,F]=3 onto let a be the next input symbol;
the stack. } else if (ACTION[s, a] = reduce A → β) {

$0E1+6F3 $ ACTION[3,$] = r4 (T → F), Pop 2*|T| symbols pop 2*|β| symbols off the stack;
from the stack, Push T and then GOTO[6,T]=9 onto let state t now be on top of the stack;
the stack. push A and then GOTO[t, A] onto the
stack;
$0E1+6T9 $ ACTION[9,$] = r1 (E → E+T), Pop 2*|E+T| symbols
} else if (ACTION[s, a] = accept) break;
from the stack, Push E and then GOTO[0,E]=1 /* parsing is done */
onto the stack.
else call error-recovery routine;
$0E1 $ ACTION[1,$] = Accept }
SLR Parser
 Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar.

 Show the parsing table for this grammar. Is the grammar SLR?
 Show the actions of your parsing table from Exercise 4.6.2 on the input
aa*a+.
SLR Parser
 Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar. S → SS+ | SS* | a First Follow

Accept S a a+*$

I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*

S
Fig: SLR Parsing table
a ACTION GOTO
I2 a a + * $ S
a
0
1
Fig: LR(0) Automata 2
3
Show the parsing table for this grammar. Is the grammar SLR? 4
Yes, the grammar is SLR, because no conflicts in SLR table, 5
SLR Parser
 Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar. S → SS+ | SS* | a First Follow

Accept S a a+*$

I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*

S
Fig: SLR Parsing table
a ACTION GOTO
I2 a a + * $ S
a
0 s2 1
1 s2 Acc 3
Fig: LR(0) Automata 2 r3 r3 r3 r3
3 s2 s4 s5 3
Show the parsing table for this grammar. Is the grammar SLR? 4 r1 r1 r1 r1
Yes, the grammar is SLR, because no conflicts in SLR table, 5 r2 r2 r2 r2
SLR Parsing (Method-2)
ACTION GOTO
 Exercise 4.6.2 : S -> SS+ | SS* | a
a + * $ S
 Show the actions of your parsing table on the input aa+.
0 s2 1

(1) S -> SS+ 1 s2 Acc 3


(2) S -> SS* 2 r3 r3 r3 r3
(3) S -> a
3 s2 s4 s5 3
4 r1 r1 r1 r1
5 r2 r2 r2 r2
Fig: SLR Parsing table

Stack Input Action


buffer
$0 aa+$ ACTION[0,a] = s2, Push a and then 2 onto the stack
SLR Parsing (Method-2)
ACTION GOTO
 Exercise 4.6.2 : S -> SS+ | SS* | a
a + * $ S
 Show the actions of your parsing table on the input aa+.
0 s2 1

(1) S -> SS+ 1 s2 Acc 3


(2) S -> SS* 2 r3 r3 r3 r3
(3) S -> a
3 s2 s4 s5 3
4 r1 r1 r1 r1
5 r2 r2 r2 r2
Fig: SLR Parsing table

Stack Input Action


buffer
$0 aa+$ ACTION[0,a] = s2, Push a and then 2 onto the stack

$0a2 a+$ ACTION[2,a] = r3 (S → a), Pop 2*|a| symbols from the stack,
Push S and then GOTO[0,S]=1 onto the stack.
$0S1 a+$ ACTION[1,a] = s2, Push a and then 2 onto the stack

$0S1a2 +$ ACTION[2,+] = r3 (S → a), Pop 2*|a| symbols from the stack,


Push S and then GOTO[1,S]=3 onto the stack.
$0S1S3 +$ ACTION[3,+] = s4, Push + and then 4 onto the stack

$0S1S3+4 $ ACTION[4,$] = r1 (S → SS+), Pop 2*|SS+| symbols from the stack,


Push S and then GOTO[0,S]=1 onto the stack.
$0S1 $ ACTION[1,$] = Accept
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented)
grammar (4.49).
Given example defines a small grammar for assignment
S -> L=R | R statements, Think of L and R as standing for l-value and r-
value, respectively, and * as an operator indicating
L -> *R | id
"contents of."
R -> L
▪ Show the parsing table for this grammar. Is the grammar SLR?

▪ Show the actions of your parsing on the input id=id.

Note: Every SLR(1) grammar is unambiguous, but there are many


unambiguous grammars that are not SLR(1).
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented) grammar (4.49).
S -> L=R | R
L -> *R | id
R -> L

Fig: LR(0) Automata


SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented) grammar (4.49).
S -> L=R | R
L -> *R | id
R -> L

Fig: LR(0) Automata Fig: SLR Parsing table


SLR Parser
Constructing SLR-Parsing Tables
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented)
grammar .
1) S -> AaAb
2) S -> BbBa
3) A -> ε
4) B -> ε
▪ Is the grammar SLR?
SLR Parser
 Exercise : Construct the SLR sets of items for the (augmented)
grammar .
1) S -> AaAb First Follow
2) S -> BbBa S ab $
A ε ab
3) A -> ε
B ε ab
4) B -> ε
▪ Is the grammar SLR?
Fig: SLR Parsing table
I0
ACTION GOTO
S I1
a b $ S A B
A I2 0 r3 r4 r3 r4 1 2 3

B I3 …

r-r conflicts in SLR table, hence the grammar is not SLR.


SLR Parser
SLR Parsing
 Exercise 4.6.4 : For each of the (augmented) grammars

➢ Construct the SLR sets of items and their GOTO function.


➢ Indicate any action conflicts in your sets of items.
➢ Construct the SLR-parsing table, if one exists.
SLR Parsing
LR Parsing

LR(0) Parsing
LR(0) Parsing
 An LR parser using an LR(0)-parsing table is an LR(0) parser.

 The LR(0) method begins with LR(0) items and LR(0) automata. That is, given
a grammar, G, we augment G to produce G', with a new start symbol S'.
From G', we construct C, the canonical collection of sets of items for G'
together with the GOTO function.

 LR(0) parsing does not care about next input symbol. It does not use
lookahead.

 During LR(0) parsing table construction, whenever there is a final item in


LR(0) automaton, we put the reduce move in the entire row corresponding
to the state that contains the final item.
LR(0) Parsing
 LR(0) is the simplest technique in the LR family. Although that makes it
the easiest to learn, these parsers are too weak to be of practical use
for anything but a very limited set of grammars.

 The fundamental limitation of LR(0) is the zero, meaning no lookahead


tokens are used. It is a stifling constraint to have to make decisions
using only what has already been read, without even glancing at what
comes next in the input.

 If we could peek at the next token and use that as part of the decision
making, we will find that it allows for a much larger class of grammars to
be parsed.
LR(0) Parsing
Is the following grammar in LR(0)?
S→A
S→ a
A→a
LR(0) Parsing
Is the following grammar in LR(0)?
E→T+E
E→T
T → id
LR(0) Parsing
Is the following grammar in LR(0)?
1) S → A a A b
2) S → B b B a First Follow
S ab $
3) A → ε A ε ab
4) B → ε B ε ab

Fig: LR(0) Parsing table

I0
ACTION GOTO
S I1 a b $ S A B
0 r3 r4 r3 r4 r3 r4 1 2 3
A I2

B I3 …

r-r conflicts in LR(0) table, hence the grammar is not LR(0).


LR(0) Parsing
Advantage of SLR over LR(0)

 The simple improvement that SLR(1) makes on the basic LR(0) parser
is to reduce only if the next input token is a member of the follow set
of the non-terminal being reduced.

 When filling in the table, we don't assume a reduce on all inputs as


we did in LR(0), we selectively choose the reduction only when the
next input symbols in a member of the follow set.
LR(0) Parsing
Comparison between LR(0) and SLR

❑ Similarities between LR(0) and SLR parsers:


❑ Both use LR(0) automata

❑ Same way to fill the Goto part for the shift move in the parsing table. SLR

❑ The only difference between the two: LR(0)

❑ Where to place the reduce move

❑ In SLR, the addition of just one token of lookahead and use of the follow set
greatly expands the class of grammars that can be parsed without conflict.

❑ If the grammar is in LR(0), it is definitely in SLR and if it is in SLR, it may or may


not be in LR(0).
LR Parsing

Viable Prefixes
LR Parsing
Viable Prefixes

 Why can LR(0) automata be used to make shift-reduce decisions?

➢ The LR(0) automaton for a grammar characterizes the strings of grammar


symbols that can appear on the stack of a shift-reduce parser for the
grammar.

➢ The stack contents must be a prefix of a right-sentential form. If the stack


holds α and the rest of the input is x, then a sequence of reductions will take
αx to S. In terms of derivations,

➢ Not all prefixes of right-sentential forms can appear on the stack, however,
since the parser must not shift past the handle. For example, suppose
LR Parsing
Viable Prefixes

➢ Then, at various times during the parse, the stack will hold (, (E, and (E), but it
must not hold (E)*, since (E) is a handle, which the parser must reduce to F
before shifting *.

➢ The prefixes of right sentential forms that can appear on the stack of a shift-
reduce parser are called viable prefixes.

➢ Viable prefixes are defined as follows: a viable prefix is a prefix of a right-


sentential form that does not continue past the right end of the rightmost handle
of that sentential form.
By this definition, it is always possible to add terminal symbols to the end of a
viable prefix to obtain a right-sentential form.
LR Parsing
Viable Prefixes

➢ The prefixes of right sentential form that can appear on the stack of a shift-
reduce parser are called viable prefixes.

We observe that at any point of time, the


stack contents must be a prefix of a right
sentential form.
However, not all prefixes of a right
sentential form can appear on the stack.
Here, ‘id *’ is a prefix of a right sentential
form. But it can never appear on the
stack! This is because we will always
reduce by F -> id before shifting ‘*’
LR Parsing
Importance of Viable Prefixes:

❑ The entire SLR parsing algorithm is based on the idea that the LR(0)
automaton can recognize viable prefixes and reduce them appropriately.

❑ E ⇒ E+T ⇒ E+F ⇒ E+id ⇒ T+id ⇒ T*F+id ⇒ T*id+id ⇒ F*id+id ⇒ id*id+id


LR Parsing
Viable Prefixes: Valid Items:
LR Parsing
Viable Prefixes: Valid Items:
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes

 We can easily compute the set of valid items for each viable prefix that
can appear on the stack of an LR parser.

 In fact, it is a central theorem of LR-parsing theory that the set of valid items
for a viable prefix γ is exactly the set of items reached from the initial state
along the path labeled γ in the LR(0) automaton for the grammar.

 In essence, the set of valid items embodies all the useful information that
can be gleaned from the stack.
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes
 Exercise 4.6.1 : Describe all the viable prefixes for the following
grammars:
More Powerful LR Parsers
 Next, we will extend the previous LR parsing techniques to use one
symbol of lookahead on the input.

 The “Canonical-LR(CLR)" or CLR(1) method or "LR" method or LR(1)


method

▪ The CLR/LR method makes full use of the lookahead


symbol(s).

▪ This method uses a large set of items, called the LR(1) items.
LR Parsing

Canonical LR/CLR Parsing


CLR / CLR(1) / LR(1) / LR
CLR Parser
Canonical LR(1) Items

S -> L=R | R Shift Reduce Conflict


L -> *R | id
R -> L

 It is possible to carry more information in the state that will allow us to rule
out some of these invalid reductions by A → α. E.g. [A→α•, a],
By splitting states when necessary, we can arrange to have each state of
an LR parser indicate exactly which input symbols can follow a handle α for
which there is a possible reduction to A.
CLR Parser
Canonical LR(1) Items

 The extra information is incorporated into the state by redefining items to


include a terminal symbol as a second component.

 The general form of an item becomes [A→α•β, a], where A→αβ is a production
and a is a terminal or the right endmarker $. We call such an object an LR(1) item.

LR(1) item = LR(0) item + one symbol of lookahead

 The 1 in LR(1) refers to the length of the second component, called


the lookahead of the item.

 The lookahead has no effect in an item of the form [A→α•β, a], where , β is
not ϵ, but an item of the form [A→α•, a], calls for a reduction by A→α only if
the next input symbol is a.
CLR Parser
Constructing LR(1) Sets of Items

 Procedures CLOSURE and GOTO for building the collection of sets of


valid LR(1) items
CLR Parser
Constructing LR(1) Sets of Items

 Procedure to construct Sets-of-LR(1)-items for grammar G'


CLR Parser
Constructing LR(1) Sets of Items

 Example 4.54 : Consider the following augmented grammar.

Construct the sets of LR(1) items.


CLR Parser
Constructing LR(1) Sets of Items

First Follow
S c d $
C c d c d $

The item: C –> d•, c/d says that it is


valid to reduce d to C only if the next input
symbol is equal to c or d.

The lookahead will always be a subset of


Follow(C).
CLR Parser
Canonical LR(1) Parsing Tables
CLR Parser
Canonical LR(1) Parsing Tables

 Example 4.57: Construct the canonical parsing table for the


following grammar.
CLR Parser
Canonical LR(1) Parsing Tables

❑ Canonical parsing table for grammar (4.55)

Figure 4.42: Canonical parsing table for grammar(4.55)


CLR Parser
Canonical LR(1) Parsing Tables

❑ Canonical parsing table for grammar (4.55)

Figure 4.42: Canonical parsing table for grammar(4.55)


CLR Parser

 Exercise 4.7.1: Construct the canonical LR sets of items for the


grammar
CLR Parser S
First
a
Follow
a+*$

 Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *

ACTION GOTO
S
a + * $ S
0
1
Fig: LR(1) Automata
2
3
4
5
6
7
8 Fig: CLR Parsing table
9
CLR Parser S
First
a
Follow
a+*$

 Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *

ACTION GOTO
S
a + * $ S
0 s2 1
1 s4 Acc 3
Fig: LR(1) Automata
2 r3 r3
3 s4 s5 s6 7
4 r3 r3 r3
5 r1 r1
6 r2 r2
7 s4 s8 S9 7
8 r1 r1 r1 Fig: CLR Parsing table
9 r2 r2 r2
CLR Parser

 Exercise : Construct the canonical LR sets of items (or LR(1) sets of


items) for the grammar S → AA
A → aA|b
CLR Parser

 Exercise : Construct the canonical LR sets of items (or LR(1) sets of


items) for the grammar S → aSb|ab
CLR Parser

 Exercise : Construct the canonical LR sets of items (or LR(1) sets of


items) for the grammar S → aSb|ab
CLR Parser

 Exercise : Construct the canonical LR sets of items (or LR(1) sets of


items) for the grammar FIRST FOLLOW

S * id $
S -> L=R | R R * id = $
L -> *R | id L * id = $

R -> L
FIRST FOLLOW

CLR Parser S
R
L
* id
* id
* id
$
= $
= $

 Exercise: Construct the canonical LR sets of items for S -> L=R | R


Accept L -> *R | id
I1 $ Fig: LR(1) Automata R -> L
I0 (1) S -> L=R
S I6 I9 (2) S -> R
R (3) L -> *R
I2
L = (4) L -> id
L I10 (5) R -> L
R I3 *
id
I4 I7 I11 L
R I13
* R
I2 I8
L *
*

id I5 id I12 id
Fig: CLR/LR Parsing table

ACTION GOTO = * id $ S L R

= * id $ S L R 7 r3 r3

0 s4 s5 1 2 3 8 r5 r5
1 Acc 9 r1
2 s6 r5 10 r5
3 r2
11 s11 s12 10 13
4 s4 s5 8 7
12 r4
5 r4 r4
13 r3
6 s11 s12 10 9
FIRST FOLLOW

CLR Parser S
R
L
* id
* id
* id
$
= $
= $

S -> L=R | R (1) S -> L=R


 Exercise: Construct the canonical LR sets of items for L -> *R | id (2) S -> R
(3) L -> *R
R -> L
▪ Show the actions of your parsing on the input id=id. (4) L -> id
(5) R -> L
Stack Input Action ACTION GOTO
$0 id=id$ ACTION[0,id] = s5 = * id $ S L R
$ 0 id 5 =id$ ACTION[5,=] = r4 (L → id) 0 s4 s5 1 2 3
$0L2 =id$ ACTION[2,=] = s6 1 Acc
$0L2=6 id$ ACTION[6,id] = s12
2 s6 r5
$ 0 L 2 = 6 id 12 $ ACTION[12,$] = r4 (L → id)
3 r2
$ 0 L 2 = 6 L 10 $ ACTION[10,$] = r5 (R -> L)
4 s4 s5 8 7
$0L2=6R9 $ ACTION[9,$] = r1 (S -> L=R)
5 r4 Fig: CLR/LR
r4 Parsing table
$0S1 $ ACTION[1,$] = Accept 6 = s11
* s12
id $ S 10
L R9
7 r3 r3
8 r5 r5
9 r1
10 r5
11 s11 s12 10 13
12 r4
13 r3
CLR Parser

 Canonical LR (CLR/CLR(1)/LR(1)/LR) grammars


▪ Every SLR(1) grammar is a Canonical LR(1) grammar, but the
canonical LR(1) parser may have more states than the SLR(1)
parser.

▪ An LR(1) grammar is not necessarily SLR(1), the grammar given


below is an example. Because an LR(1) parser splits states based
on differing lookaheads, it may avoid conflicts that would
otherwise result if using the full follow set.

S -> L=R | R
L -> *R | id LR(1), not SLR(1)
R -> L
LR Parsing

LALR Parsing
LALR Parser

 The "lookahead-LR" or "LALR" or "LALR(1)" method

❑ The LALR method has many fewer states than typical parsers based
on the LR(1) items.

❑ By carefully introducing lookaheads into the LR(0) items, we can

▪ handle many more grammars with the LALR method than with the
SLR method, and

▪ build parsing tables that are no bigger than the SLR tables.

❑ LALR is the method of choice in most situations.


LALR Parser
Constructing LALR Parsing Tables

 LALR method is often used in practice, because the tables obtained


by it are considerably smaller than the canonical LR tables.

 The most common syntactic constructs of programming languages


can be expressed conveniently by an LALR grammar.

 The same is almost true for SLR grammars, but there are a few
constructs that cannot be conveniently handled by SLR techniques
(see Example 4.48, for example).
LALR Parser
Constructing LALR Parsing Tables

 For a comparison of parser size, the SLR and LALR tables for a
grammar always have the same number of states.

 For a language like C:


▪ The SLR and LALR tables for a grammar typically have several hundred
states.

▪ The CLR table would typically have several thousand states for the same-
size language.

▪ Thus, it is much easier and more economical to construct SLR and LALR
tables than the canonical LR tables.
LALR Parser
Constructing LALR Parsing Tables
LALR Parser
Constructing LALR Parsing Tables

 The table produced by Algorithm 4.59 is called the LALR parsing


table for G.

 If there are no parsing action conflicts, then the given grammar is


said to be an LALR(1) grammar.

 The collection of sets of items constructed in step (3) is called the


LALR(1) collection.
LALR Parser
Constructing LALR Parsing Tables

 Example 4.60 : Again, consider grammar (4.55). there are three pairs of sets
of items that can be merged.
Fig: LALR/LALR(1) Automata
Fig: LR(1) Automata I0 I1
S $ Accept

I2 I5
C

c d
d I36 c
I89
C

c
LALR/LALR(1) Parsing table
ACTION GOTO
I47 d
c d $ S C

36

Note: Find states having similar core and replace them by 47

their union and make transitions correspondingly. 5

89
LALR Parser
s5 state number r3 Production number
Constructing LALR Parsing Tables
shift reduce

 Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d


LALR Parser
s5 state number r3 Production number
Constructing LALR Parsing Tables
shift reduce

 Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d


LALR Parser
Constructing LALR Parsing Tables

 The LALR action and goto functions for the condensed sets of items
are shown in Fig. 4.43.
LALR Parser
Exercise 4.7.1:

 Construct the LALR sets of items for the grammar


LALR Parser
First Follow (1) S -> SS+
(2) S -> SS*
S a a+*$ (3) S -> a

 Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
S
Accept
I0 I1 $ I37 I58 Fig: LALR Parsing table

S S + ACTION GOTO

I69 a + * $ S
0 s24 1
*
1 s24 Acc 37

S 24 r3 r3 r3 r3

37 s24 s58 s69 37


a
a I24 a 58 r1 r1 r1 r1

Fig: LALR/LALR(1) Automata 69 r2 r2 r2 r2


LALR Parser
First Follow (1) S -> SS+
(2) S -> SS*
S a a+*$ (3) S -> a

 Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
Fig: CLR Parsing table

ACTION GOTO
S
a + * $ S
0 s2 1 Fig: LALR Parsing table

1 s4 Acc 3 ACTION GOTO


2 r3 r3 a + * $ S
3 s4 s5 s6 7 0 s24 1

4 r3 r3 r3 1 s24 Acc 37

5 r1 r1 24 r3 r3 r3 r3

6 r2 r2 37 s24 s58 s69 37

7 s4 s8 S9 7 58 r1 r1 r1 r1

8 r1 r1 r1 69 r2 r2 r2 r2

9 r2 r2 r2
LALR Parser
 Specify whether the following grammar is in LALR or not

S -> A a | b A c | d c
A -> d
More Powerful LR Parsers
Exercise 4.7.2:

 Construct the
a) canonical LR, and

b) LALR

sets of items for the grammars


More Powerful LR Parsers
1) S -> A a 4) S -> b B a
LALR Parser 2) S -> b A c
3) S -> B c
5) A -> d
6) B -> d
Exercise 4.7.5 Show that the following grammar is in LR(1) but not LALR(1)
I0 I1 Fig: CLR/LR Parsing table
S $
Accept
a b c d $ S A B
I2 I6
A a 0 s3 s5 1 2 4

I3 I7 I11 1 Acc

b A c
2 s6
I8
B a 3 s9 7 8

I9 I12 4 s10
d
5 r5 r6
I4 I10
B c 6 r1

I5 7 s11
d
8 s12
Fig: LALR Parsing table

a b c d $ S A B 9 r6 r5
Fig: LR(1) Automaton
10 r3
0 s3 s59 1 2 4

11 r2
… Acc

12 r4
r-r conflicts, not LALR(1) 59 r5/r6 r5/r6


LR Parsers
SLR(1) or LALR(1) or CLR(1) or CLR
LR(0) SLR LALR or LR(1) or LR
(Simple LR) (LookAhead LR) (Canonical LR)
LR(1) automata
LR(0) automata States containing LR(1) items.
DFA
States containing LR(0) items LR(1) item = LR(0) item +
lookahead

Reduce Move Place the


Place the
reduce move in Place the Place the
Placement in reduce move
the entire row reduce move reduce move
Action part of only in the
for the state only in the only in the
the Parsing Follow of
that contains a lookaheads lookaheads
table (LHS).
final item
More powerful More powerful
Power of Parser Least Powerful Most Powerful
than LR(0) than SLR
Number of n : Number of States in a parser
States n(LR(0)) == n(SLR) == n(LALR) <= n(CLR)
Table: Summary of all the table-driven bottom-up parsers
LR Parsers
 Specify whether the following grammar is in SLR, LALR, LR or not

1) S -> A [A] 2) S -> Ac | bA | bc

A -> (A) | ε A -> ε

3) S -> Aa | bAc | Bc | bBa


4) S -> AS | a
A -> d
A -> SA | b
B -> d
LR Parsers
 Specify whether the following grammar is in SLR, LALR, LR or not

1) S -> A [A] 2) S -> Ac | bA | bc

A -> (A) | ε A -> ε


Answer: SLR X, LALR Y, LR Y
Answer: SLR Y, LALR Y, LR Y

3) S -> Aa | bAc | Bc | bBa


4) S -> AS | a
A -> d
A -> SA | b
B -> d
Answer: SLR X, LALR X, LR X
Answer: SLR X, LALR X, LR Y

Note: X - NO, Y - YES


LR Parsers
 Specify whether the following grammar is in LALR, LR or not

5) S -> A a | b A c | d c
A -> d

Answer: LALR Y, LL X

Note: CLR will not make any reduction if there is an error in


the string. LALR and SLR may make a few reductions before
declaring error.
LR Parsers

Exercise 01:
 Consider the following grammar:

S→AA

A→aA|b

Note that this grammar is in LR(0).

Make its CLR, LALR parsing tables and parse the string “abb”.
LR Parsers

Exercise 02:
 Consider the following grammar:

S→dA|aB

A→bA|c

B→bB|c

Note that this grammar is in LR(0).

Make its CLR, LALR parsing tables and parse the string “abc”.
LR Parsers

Exercise 03:
 Consider the following grammar:

E→E+T|T

T→T*F|F

F → id

Note that this grammar is in SLR.

Find out if the given grammar is in LR(0), LALR and CLR.


LR Parsers

Exercise 04:
 Consider the following grammar:

S→(X|E]|F)

X→E)|F]

E→A

F→A

A→λ (Null production)

Is the given grammar in LL, LR(0), SLR, LALR and LR?


LR Parsers

Exercise 05:

 Provide a production with the shortest RHS that introduces s/r


conflict in SLR parser for the given grammar:

S→Ab

A→cbAd

A→cAd

A→λ
LR Parsers

Exercise 06:

 Is the given grammar in LL, LR(0), SLR, LALR and LR?

S→Abbx|Bbby

A→x

B→x

Specify the value of i and j such that the grammar in LL(i) and LR(j).
LR Parsers
Why is LR(1) so powerful?

 Intuitively, for two reasons:

1) Lookahead makes handle-finding easier.


▪ The LR(0) automaton says whether there could be a handle later on
based on no right context.

▪ The LR(1) automaton can predict whether it needs to reduce based on


more information.

2) More states encode more information.


▪ LR(1) lookaheads are very good because there's a greater number of
states to be in.

▪ Goal: Incorporate lookahead without increasing the number of states.


LR Parsers
LR Parsers

 Note that this diagram refers to grammars, not languages, e.g. there may be an
equivalent LR(1) grammar that accepts the same language as another non-LR(1)
grammar. No ambiguous grammar is LL(1) or LR(1), so we must either rewrite the
grammar to remove the ambiguity or resolve conflicts in the parser table or
implementation.

 The hierarchy of LR variants is clear: every LR(0) grammar is SLR(1) and every SLR(1) is
LALR(1) which in turn is LR(1). But there are grammars that don’t meet the
requirements for the weaker forms that can be parsed by the more powerful
variations.
LR Parsers
LL (1) v/s LALR (1)

 Error repair:
▪ Both LL(1) and LALR(1) parsers possess the valid/viable prefix property.
What is on the stack will always be a valid prefix of a sentential form.
Errors in both types of parsers can be detected at the earliest possible
point without pushing the next input symbol onto the stack.
▪ LL(1) parse stacks contain symbols that are predicted but not yet
matched. This information can be valuable in determining proper
repairs.
▪ LALR(1) parse stacks contain information about what has already been
seen, but do not have the same information about the right context that
is expected.
▪ This means deciding possible continuations is somewhat easier in an
LL(1) parser.
LR Parsers
LL (1) v/s LALR (1)

 Efficiency:
▪ Both require a stack of some sort to manage the input. That stack can
grow to a maximum depth of n, where n is the number of symbols in the
input.

▪ If you are using the runtime stack (i.e. function calls) rather than pushing
and popping on a data stack, you will probably pay some significant
overhead for that convenience (i.e. a recursive descent parser takes
that hit).

▪ If both parsers are using the same sort of stack, LL(1) and LALR(1) each
examine every non-terminal and terminal when building the parse tree,
and so parsing speeds tend to be comparable between the two.
LR Parsers - Exercises
 Is the given grammar in LL(1), SLR, LALR(1) , LR or not??

P → M *| ε
M → Q StarM| ε
StarM → (* M *)| ( Q * )
Q → o | ε

Some sentences generated by this grammar: {ε, *, (* *) *, ( * ) *,


( o * ) *, o ( * ) *, o (* *) *, o ( o * ) *, (* (* *) *) *, (* (
* ) *) *, (* o (* *) *) *, o (* ( * ) *) *, (* o ( * ) *) *, o (*
(* *) *) *, (* ( o * ) *) *, o (* ( o * ) *) *, (* o ( o * ) *) *,
o (* o ( * ) *) *, o (* o (* *) *) *, o (* o ( o * ) *) *}
LR Parsers - Exercises
 Is the given grammar in LL(1), LALR(1)??

S → [ X| E )| F [
X → E )| F ]
E → A
F → A
A → ε
LR Parsers - Exercises
 Is the given grammar in LL(1), SLR(1), LALR(1) , CLR(1) or
not??
L → V ( args )
| L equals Var ( )
V → Var + V
| id
Var → id

Some sentences generated by this grammar: {id ( args ), id + id (


args ), id + id + id ( args ), id( args ) equals id ( ), id + id +
id + id ( args ), id + id ( args ) equals id ( ), id + id + id +
id + id( args ), id + id + id ( args ) equals id ( ), id + id + id
+ id + id + id ( args )}
LR Parsers
LR Parsing
LR Parsers
The space of grammars
The space of grammars

❑ LALR(1) is a subset of LR(1) and a superset of SLR(1).

❑ A grammar that is not LR(1) is definitely not LALR(1), since whatever


conflict occurred in the original LR(1) parser will still be present in the
LALR(1).

❑ A grammar that is LR(1) may or may not be LALR(1) depending on


whether merging introduces conflicts.
The space of grammars

❑ A grammar that is SLR(1) is definitely LALR(1). A grammar that is not


SLR(1) may or may not be LALR(1) depending on whether the more
precise lookaheads resolve the SLR(1) conflicts.

❑ LALR(1) has proven to be the most used variant of the LR family.


The space of grammars

❑ The weakness of the SLR(1) and LR(0) parsers mean they are only
capable of handling a small set of grammars.

❑ The expansive memory needs of LR(1) caused it to languish for


several years as a theoretically interesting but intractable approach.

❑ It was the advent of LALR(1) that offered a good balance between


the power of the specific lookaheads and table size.
The space of grammars

❑ The popular tools yacc and bison generate LALR(1) parsers and most
programming language constructs can be described with an LALR(1)
grammar.
Parser Generators

LAB Instructions
Parser Generators
The Parser Generator - Yacc

 GNU Bison, commonly known as Bison, is a parser generator that is


part of the GNU Project. Bison reads a specification in the BNF
notation (a context-free language), warns about any parsing
ambiguities, and generates a parser that reads sequences
of tokens and decides whether the sequence conforms to the
syntax specified by the grammar.

 The generated parsers are portable: they do not require any


specific compilers. Bison by default generates LALR(1) parsers but it
can also generate canonical LR, IELR(1) and GLR parsers.
Parser Generators
❑ Interaction between Lex and Yacc

$ lex lexer.l // generates lex.yy.c - contains definition of yylex()


$ yacc parser.y // generates (1) y.tab.c - contains definition of yyparse() (2) y.tab.h
- contains token definitions

 Parser drives the lexical analysis - it must know the function which performs lexical analysis, Hence,
we must declare yylex() function in definitions part of the yacc file

 Similarly, since we expect the lex file to generate tokens, it must know their definitions, Hence, we
must include the y.tab.h file in definitions part of the lex file
Parser Generators
The Parser Generator - Yacc

❑ A Yacc source program has three parts:


▪ The Declarations Part
▪ The Translation Rules Part
▪ The Supporting C-Routines Part

%{
/* C includes */
}%
/* Other Declarations */

%%
/* Rules */

%%
/* user subroutines */
Parser Generators
The Parser Generator - Yacc
➢ Names(terminals) representing tokens must be declared; this is done by
writing
• %token name1 name2 . . .

➢ Every name not defined in the declarations section is assumed to represent


a nonterminal symbol.

➢ Every nonterminal symbol must appear on the left side of at least one rule.

➢ Any name not declared as a token in the declarations section is assumed


to be a nonterminal.

➢ By convention, tokens have uppercase names, although bison doesn’t


require it.

➢ If a symbol neither is a token nor appears on the left side of a rule, it’s like
an unreferenced variable in a C program. It doesn’t hurt anything, but it
probably means the programmer made a mistake.
Parser Generators
The Parser Generator - Yacc

 Start symbol : – may be declared, via:


▪ %start name

– if not declared explicitly, defaults to the nonterminal on the LHS of the first
grammar rule listed.
Parser Generators
The Parser Generator - Yacc

#define YYSTYPE double

 The type of yylval is determined by YYSTYPE. Since the default type is


integer. Token values 0-255 are reserved for character values.

 Generated token values typically start around 258 because lex


reserves several values for end-of-file and error processing.
Parser Generators
The Parser Generator - Yacc

 The precedences and associativities are attached to tokens in the


declarations section.

 This is done by a series of lines beginning with the yacc keywords %left,
%right, or %nonassoc, followed by a list of tokens. All of the tokens on the
same line are assumed to have the same precedence level and
associativity; the lines are listed in order of increasing precedence or
binding strength. Thus:

• %left '+' '-'

• %left '*' '/'

 describes the precedence and associativity of the four arithmetic


operators. + and - are left associative and have lower precedence
than * and /, which are also left associative. The keyword %right is used to
describe right associative operators.
Parser Generators
The Parser Generator - Yacc

 Example 4.69 : To illustrate how to prepare a Yacc source program, let us


construct a simple desk calculator that reads an arithmetic expression,
evaluates it, and then prints its numeric value. We shall build the desk
calculator starting with the with the following grammar for arithmetic
expressions:

 The token digit is a single digit between 0 and 9. A Yacc desk calculator
program derived from this grammar is shown in Fig. 4.58.
Parser Generators
The Parser Generator - Yacc

main() repeatedly calls yyparse() until


the lexer’s input file runs out.
Parser Generators
The Parser Generator - Yacc

❑ The Declarations Part

 There are two sections in the declarations part of a Yacc program;


both are optional.

 In the first section, we put ordinary C declarations, delimited by %{ and


%}. Here we place declarations of any temporaries used by the
translation rules or procedures of the second and third sections.

 In Fig. 4.58, this section contains only the include-statement

#include <ctype.h>

that causes the C preprocessor to include the standard header file


<ctype.h> that contains the predicate isdigit.
Parser Generators
The Parser Generator - Yacc

 The Declarations Part

 Also in the declarations part are declarations of grammar tokens. In Fig.


4.58, the statement

%token DIGIT

declares DIGIT to be a token. Tokens declared in this section can then be


used in the second and third parts of the Yacc specification.

If Lex is used to create the lexical analyzer that passes token to the Yacc
parser, then these token declarations are also made available to the
analyzer generated by Lex.
Parser Generators
The Parser Generator - Yacc

❑ The Translation Rules Part

 In the part of the Yacc specification after the first %% pair, we put the
translation rules.

 Each rule consists of a grammar production and the associated semantic


action. A set of productions that we have been writing:
Parser Generators
The Parser Generator - Yacc

❑ The Translation Rules Part

 In a Yacc production, unquoted strings of letters and digits not declared to


be tokens are taken to be nonterminals.

 A quoted single character, e.g. 'c‘, is taken to be the terminal symbol c, as


well as the integer code for the token represented by that character (i.e.,
Lex would return the character code for ‘c’ to the parser, as an integer).

 Alternative bodies can be separated by a vertical bar, and a semicolon


follows each head with its alternatives and their semantic actions. The first
head is taken to be the start symbol.
Parser Generators
The Parser Generator - Yacc

❑ The Translation Rules Part


Parser Generators
The Parser Generator - Yacc

❑ The Translation Rules Part


Parser Generators
The Parser Generator - Yacc

❑ The Translation Rules Part

The rules section simply consists of a list of grammar rules. Since ASCII
keyboards don’t have a → key, we use a colon between the left- and
right-hand sides of a rule, and we put a semicolon at the end of each rule.
Parser Generators
The Parser Generator - Yacc
Parser Generators

Execution

❑ Linux/MacOS
$ lex lexer.l // generates lex.yy.c
$ yacc parser.y // generates y.tab.c, y.tab.h
$ gcc y.tab.c lex.yy.c -ll -ly // linking lex and yacc
$ ./a.out < input // run the executable

❑ Windows
$ bison -dy prog.y
$ flex hello.l
$ gcc y.tab.c lex.yy.c
$ a.exe < input
References

 Compilers–Principles, Techniques and Tools, Alfred V.


Aho, Monica S. Lam, Ravi Sethi, Jeffery D. Ullman, 2nd
Edition
LR Parsing: Simple LR
Use of the LR(0) Automaton

 The central idea behind "Simple LR," or SLR, parsing is the construction of
LR(0) automaton from the grammar.

 How can LR(0) automata help with shift-reduce decisions?

 Shift-reduce decisions can be made as follows.

 Suppose that the string γ of grammar symbols takes the LR(0) automaton
from the start state 0 to some state j.

▪ Then, shift on next input symbol a if state j has a transition on a.

▪ Otherwise, we choose to reduce; the items in state j will tell us which


production to use.
LR Parsing: Simple LR
Use of the LR(0) Automaton

Example : Figure 4.34 illustrates the actions of a shift-reduce parser on input


id *id, using the LR(0) automaton in Fig. 4.31. We use a stack to hold
states.

Note: Item sets with no outgoing arrows mean reduce


LR Parsing: Simple LR
The LR-Parsing
LR Parsing: Simple LR
The LR-Parsing

Means shift and move to state 4

Means reduce using production number 1


LR Parsing: Simple LR
The LR-Parsing
LR Parsing: Simple LR
The LR-Parsing
LR Parsing: Simple LR
LR Parsing: Simple LR
Constructing SLR-Parsing Tables

 Example 4.48 : Every SLR(1) grammar is unambiguous, but there are many
unambiguous grammars that are not SLR(1). Consider the grammar with
productions

Think of L and R as standing for l-value and r-value, respectively, and * as


an operator indicating "contents of."
LR Parsing: Simple LR
Constructing SLR-Parsing Tables
LR Parsing: Simple LR
Constructing SLR-Parsing Tables
Syntax Error Handling
➢ A reason for emphasizing error recovery during parsing is that

▪ Many errors appear syntactic, whatever their cause, and are exposed

when parsing cannot continue.

▪ A few semantic errors, such as type mismatches, can also be detected

efficiently.

An accurate detection of semantic and logical errors at compile time is

in general a difficult task.

You might also like