CSC-437 Chapter 4

CSC 437 (Compilers)
Syntax Analysis
Fardina Fathmiul Alam

Faculty
Department of Computer Science and Engineering
IUBAT - International University of Business Agriculture and Technology
01/26/2022 1
The Role of the Parser
In our compiler model, the parser gets a string of tokens

from the lexical analyzer.
It then verifies that the string of token names can be
generated by the grammar for the source language.
01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

2 2/127
The Role of the Parser — continued
We expect the parser

to report any syntax errors
to recover from commonly occurring errors to continue
processing the remainder of the program.

3 3/127
Conceptually, for well-formed programs, the parser

constructs a parse tree and passes it to the rest of the
compiler for further processing.

4 3/127
There are three general types of parsers for grammars:

universal,
top-down, and
bottom-up.
Universal parsing methods such as the
Cocke-Younger-Kasami algorithm and Earley’s algorithm
can parse any grammar.
These general methods are, however, too inefficient to use
in production compilers.

5 4/127
The methods commonly used in compilers can be

classified as being either top-down or bottom-up.
Top-down methods build parse trees from the top (root)
to the bottom (leaves).
Bottom-up methods start from the leaves and work their
way up to the root.
In either case, the input to the parser is scanned from left
to right, one symbol at a time.

6 5/127
The most efficient top-down and bottom-up methods work

only for subclasses of grammars.
Among several of these subclasses, particularly, LL and LR
grammars, are use to describe most of the
syntactic constructs in modern programming languages.
Parsers implemented by hand often use LL grammars.
Parsers for the larger class of LR grammars are usually

constructed using automated tools.

7 6/127
Representative Grammars — continued
Associativity and precedence are captured in the following

grammar.
E represents expressions consisting of terms separated by
+ signs.
T represents terms consisting of factors separated by ∗
signs.
F represents factors that can be either parenthesized
expressions or identifiers:
E
→ E+T |T
T
→ T ∗F |F
F
→ (E )|id
8 8/127
E
T
→ E+T |T
→ T ∗F |F
F
→ (E )|id
The above grammar belongs to the class of LR grammars
that are suitable for bottom-up parsing.
However, it cannot be used for top-down parsing because

it is left recursive.

9 9/127
The following non-left-recursive variant of the expression

grammar will be used for top-down parsing:
E → TE ′
E′
→ +TE ′ | ϵ
T → FT ′
T′
→ ∗FT ′ | ϵ
F
→ (E )|id

1010 / 127
Writing a Grammar
Grammars are capable of describing most, but not all, of

the syntax of programming languages.
The sequences of tokens accepted by a parser

form a superset of the programming language.

1119 / 127
Elimination of Left Recursion
A grammar is left recursive if it has a nonterminal A such

+
that there is a derivation A == ⇒ Aa for some string a.
Top-down parsing methods cannot handle left-recursive
grammars, so a transformation that eliminates left
recursion is needed.
In simple left recursion there was one production of the
form A → Aα.
Here we study the general case.

1220 / 127
Elimination of Left Recursion — continued
Left-recursive pair of productions A → Aα | β can be

replaced by the non-left-recursive productions
A → βA′
A′
→ αA′ | ϵ
without changing the set of strings derivable from A.
This rule by itself suffices in many grammars.

1321 / 127
Example
Grammar for arithmetic

expressions,
E → E+T |T
T → T ∗F |F
F → (E )|id
A→Aα|β
Eliminating the immediate left
to be replaced by recursions we obtain,
E → TE ′
A → βA
E′ → +TE ′ | ϵ
′
T → FT ′
A′ → αA′ | ϵ
T′ →∗FT ′ | ϵ
F → E | (id)

1422 / 127
01/26/2022 15
01/26/2022 16
01/26/2022 17
01/26/2022 18
Top-Down Parsing
• The parse tree is created top to bottom.

• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to
try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing without
backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
01/26/2022 19
Recursive-Descent Parsing (uses Backtracking)
S
input: abc
a B c a B
c
b c b
fails, backtrack
01/26/2022 20
Predictive Parsing
• The goal of predictive parsing is to construct a top-down parser

that never backtracks. To do so, we must transform a grammar in
two ways:
• eliminate left recursion, and
• perform left factoring.
left factoring helps predict which rule to use without backtracking.
01/26/2022 21
01/26/2022 22
01/26/2022 23
01/26/2022 24
01/26/2022 25
01/26/2022 26
01/26/2022 27
01/26/2022 28
01/26/2022 29
01/26/2022 30
01/26/2022 31
01/26/2022 32
01/26/2022 33
01/26/2022 34
01/26/2022 35
01/26/2022 36
01/26/2022 37
01/26/2022 38
01/26/2022 39
01/26/2022 40
01/26/2022 41
01/26/2022 42
01/26/2022 43
Context Free Grammar
Definition :
That is used to specify the syntax of a language. A grammar naturally
describes the hierarchical structure of most programming language
constructs.
For example, an if-else statement in Java can have the form
if ( expression ) statement else statement
Using the variable expr to denote an expression and
the variable stmt to denote a statement, this structuring rule can be expressed as
Stmt if expr stmt else stmt

• in which the arrow may be read as “\can have the form."
• Such a rule is called a production In a production,
• lexical elements like the keyword if and the parentheses are called terminals
• Variables like expr and stmt represent sequences of terminals and are called
nonterminals.
01/26/2022 44
A context-free grammar has four components:
1. A set of terminal symbols, sometimes referred to as \tokens." The terminals are
the elementary symbols of the language defined by the grammar.
2. A set of nonterminals, sometimes called \syntactic variables." Each nonterminal
represents a set of strings of terminals.
3. A set of productions where each production consists of a nonterminal, called the
head or left side of the production, an arrow, and a sequence of terminals
and/or nonterminals, called the body or right side of the production.
4. A designation of one of the nonterminals as the start symbol.
01/26/2022 45
The grammar here defines simple arithmetic expressions. In this grammar,
• the terminal symbols are id + - * / ( ) .
• The nonterminal symbols are expression, term and factor,
• and expression is the start symbol
expression → expression + term
expression → expression - term
Expression → term
Term → term *factor
term→ term / factor
term →factor
factor →(expression)
factor→ id
Figure : Grammar for simple arithmetic expressions
01/26/2022 46
expression → expression + term
express → expression - term
Expression term
Term → term *factor
term→ term / factor
term →factor
factor →(expression)
factor→ id
Using these conventions, we can write down,

E→ E+ T | E- T | T
T →T* F | T / F | F
F→( E) | id
The notational conventions tell us that E T, and F are nonterminals, with E the start
symbol. The remaining symbols are terminals
01/26/2022 47
Derivation
For example, consider the following grammar, with a single nonterminal E which adds
a production :
E→ E +E | E *E | -E |( E )| id
2 types derivation can happen : Left Most & Right Most Derivation
Example : (Left Most)

Input: -(id+id)
E → -E →-( E ) →-(E+ E ) →- ( id +E ) →- (id + id )
TRY RIGHT MOST!!
01/26/2022 48
Ambiguity
• a grammar that produces more than one parse tree for some
sentence is said to be ambiguous .Put another way, an
ambiguous grammar is one that produces more than one
leftmost derivation or more than one rightmost derivation for
the same sentence
• : The arithmetic expression grammar permits two distinct
leftmost derivations for the sentence id+id*id
• E→ E *E E →E +E
→E+E*E , → id+E
→Id+E*E → id+E*E
→ id+id*E → id+id*E
→ Id+id*id → id+id*id
01/26/2022 49
Ambiguity
01/26/2022 50
LL(1) Grammer
• A predictive parser is a recursive descent parser that does not
require backtracking.
• Predictive parsing is possible only for the class of LL(k) grammars,
which are the context-free grammars for which there exists some
positive integer k that allows a recursive descent parser to decide
which production to use by examining only the next k tokens of input.
• The LL(k) grammars therefore exclude all ambiguous grammars, as
well as all grammars that contain left recursion.
01/26/2022 51
01/26/2022 52
01/26/2022 53
01/26/2022 54
01/26/2022 55
01/26/2022 56
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.

• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
stack Non-recursive output

Predictive Parser
Parsing Table
01/26/2022 57
Non-Recursive Predictive Parsing -- LL(1) Parser
01/26/2022 58
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S  initial
stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
01/26/2022 59
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule XY1Y2...Yk
to represent a step of the derivation.
4. none of the above  error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.
01/26/2022 60
01/26/2022 61
01/26/2022 62
01/26/2022 63
01/26/2022 64
The End
01/26/2022 65

CSC-437 Chapter 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSC-437 Chapter 4

Uploaded by

Copyright:

Available Formats

CSC 437 (Compilers)

Fardina Fathmiul Alam

In our compiler model, the parser gets a string of tokens

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

We expect the parser

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

Conceptually, for well-formed programs, the parser

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

There are three general types of parsers for grammars:

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

The methods commonly used in compilers can be

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

The most efficient top-down and bottom-up methods work

Parsers for the larger class of LR grammars are usually

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

Associativity and precedence are captured in the following

However, it cannot be used for top-down parsing because

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

The following non-left-recursive variant of the expression

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

Grammars are capable of describing most, but not all, of

The sequences of tokens accepted by a parser

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

A grammar is left recursive if it has a nonterminal A such

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

Left-recursive pair of productions A → Aα | β can be

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

Grammar for arithmetic

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)

• The parse tree is created top to bottom.

• The goal of predictive parsing is to construct a top-down parser

left factoring helps predict which rule to use without backtracking.

Stmt if expr stmt else stmt

Figure : Grammar for simple arithmetic expressions

Using these conventions, we can write down,

Example : (Left Most)

E → -E →-( E ) →-(E+ E ) →- ( id +E ) →- (id + id )

TRY RIGHT MOST!!

• Non-Recursive predictive parsing is a table-driven parser.

stack Non-recursive output

You might also like