You are on page 1of 65

CSC 437 (Compilers)

Syntax Analysis

Fardina Fathmiul Alam


Faculty
Department of Computer Science and Engineering
IUBAT - International University of Business Agriculture and Technology

01/26/2022 1
The Role of the Parser

In our compiler model, the parser gets a string of tokens


from the lexical analyzer.
It then verifies that the string of token names can be
generated by the grammar for the source language.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


2 2/127
The Role of the Parser — continued

We expect the parser


to report any syntax errors
to recover from commonly occurring errors to continue
processing the remainder of the program.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


3 3/127
The Role of the Parser — continued

Conceptually, for well-formed programs, the parser


constructs a parse tree and passes it to the rest of the
compiler for further processing.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


4 3/127
The Role of the Parser — continued

There are three general types of parsers for grammars:


universal,
top-down, and
bottom-up.
Universal parsing methods such as the
Cocke-Younger-Kasami algorithm and Earley’s algorithm
can parse any grammar.
These general methods are, however, too inefficient to use
in production compilers.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


5 4/127
The Role of the Parser — continued

The methods commonly used in compilers can be


classified as being either top-down or bottom-up.
Top-down methods build parse trees from the top (root)
to the bottom (leaves).
Bottom-up methods start from the leaves and work their
way up to the root.
In either case, the input to the parser is scanned from left
to right, one symbol at a time.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


6 5/127
The Role of the Parser — continued

The most efficient top-down and bottom-up methods work


only for subclasses of grammars.
Among several of these subclasses, particularly, LL and LR
grammars, are use to describe most of the
syntactic constructs in modern programming languages.
Parsers implemented by hand often use LL grammars.

Parsers for the larger class of LR grammars are usually


constructed using automated tools.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


7 6/127
Representative Grammars — continued

Associativity and precedence are captured in the following


grammar.
E represents expressions consisting of terms separated by
+ signs.
T represents terms consisting of factors separated by ∗
signs.
F represents factors that can be either parenthesized
expressions or identifiers:

E
→ E+T |T
T
→ T ∗F |F
F
→ (E )|id
01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)
8 8/127
Representative Grammars — continued

E
T
→ E+T |T
→ T ∗F |F
F
→ (E )|id
The above grammar belongs to the class of LR grammars
that are suitable for bottom-up parsing.

However, it cannot be used for top-down parsing because


it is left recursive.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


9 9/127
Representative Grammars — continued

The following non-left-recursive variant of the expression


grammar will be used for top-down parsing:
E → TE ′
E′
→ +TE ′ | ϵ
T → FT ′
T′
→ ∗FT ′ | ϵ
F
→ (E )|id

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


1010 / 127
Writing a Grammar

Grammars are capable of describing most, but not all, of


the syntax of programming languages.

The sequences of tokens accepted by a parser


form a superset of the programming language.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


1119 / 127
Elimination of Left Recursion

A grammar is left recursive if it has a nonterminal A such


+
that there is a derivation A == ⇒ Aa for some string a.
Top-down parsing methods cannot handle left-recursive
grammars, so a transformation that eliminates left
recursion is needed.
In simple left recursion there was one production of the
form A → Aα.
Here we study the general case.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


1220 / 127
Elimination of Left Recursion — continued

Left-recursive pair of productions A → Aα | β can be


replaced by the non-left-recursive productions
A → βA′
A′
→ αA′ | ϵ
without changing the set of strings derivable from A.
This rule by itself suffices in many grammars.

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


1321 / 127
Example

Grammar for arithmetic


expressions,
E → E+T |T
T → T ∗F |F
F → (E )|id
A→Aα|β
Eliminating the immediate left
to be replaced by recursions we obtain,
E → TE ′
A → βA
E′ → +TE ′ | ϵ

T → FT ′
A′ → αA′ | ϵ
T′ →∗FT ′ | ϵ
F → E | (id)

01/26/2022 Dr. Muhammad Masroor Ali CSE 303 (Compilers)


1422 / 127
01/26/2022 15
01/26/2022 16
01/26/2022 17
01/26/2022 18
Top-Down Parsing

• The parse tree is created top to bottom.


• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to
try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient

– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing without
backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

01/26/2022 19
Recursive-Descent Parsing (uses Backtracking)

S
input: abc
a B c a B
c

b c b

fails, backtrack

01/26/2022 20
Predictive Parsing

• The goal of predictive parsing is to construct a top-down parser


that never backtracks. To do so, we must transform a grammar in
two ways:
• eliminate left recursion, and
• perform left factoring.

left factoring  helps predict which rule to use without backtracking. 

01/26/2022 21
01/26/2022 22
01/26/2022 23
01/26/2022 24
01/26/2022 25
01/26/2022 26
01/26/2022 27
01/26/2022 28
01/26/2022 29
01/26/2022 30
01/26/2022 31
01/26/2022 32
01/26/2022 33
01/26/2022 34
01/26/2022 35
01/26/2022 36
01/26/2022 37
01/26/2022 38
01/26/2022 39
01/26/2022 40
01/26/2022 41
01/26/2022 42
01/26/2022 43
Context Free Grammar
Definition :
That is used to specify the syntax of a language. A grammar naturally
describes the hierarchical structure of most programming language
constructs.
For example, an if-else statement in Java can have the form
if ( expression ) statement else statement
Using the variable expr to denote an expression and
the variable stmt to denote a statement, this structuring rule can be expressed as

Stmt if expr stmt else stmt


• in which the arrow may be read as “\can have the form."
• Such a rule is called a production In a production,
• lexical elements like the keyword if and the parentheses are called terminals
• Variables like expr and stmt represent sequences of terminals and are called
nonterminals.

01/26/2022 44
Context Free Grammar
A context-free grammar has four components:
1. A set of terminal symbols, sometimes referred to as \tokens." The terminals are
the elementary symbols of the language defined by the grammar.
2. A set of nonterminals, sometimes called \syntactic variables." Each nonterminal
represents a set of strings of terminals.
3. A set of productions where each production consists of a nonterminal, called the
head or left side of the production, an arrow, and a sequence of terminals
and/or nonterminals, called the body or right side of the production.
4. A designation of one of the nonterminals as the start symbol.

01/26/2022 45
Context Free Grammar
The grammar here defines simple arithmetic expressions. In this grammar,
• the terminal symbols are id + - * / ( ) .
• The nonterminal symbols are expression, term and factor,
• and expression is the start symbol
expression → expression + term
expression → expression - term
Expression → term
Term → term *factor
term→ term / factor
term →factor
factor →(expression)
factor→ id

Figure : Grammar for simple arithmetic expressions

01/26/2022 46
Context Free Grammar
expression → expression + term
express → expression - term
Expression term
Term → term *factor
term→ term / factor
term →factor
factor →(expression)
factor→ id

Using these conventions, we can write down,


E→ E+ T | E- T | T
T →T* F | T / F | F
F→( E) | id
The notational conventions tell us that E T, and F are nonterminals, with E the start
symbol. The remaining symbols are terminals

01/26/2022 47
Derivation
For example, consider the following grammar, with a single nonterminal E which adds
a production :
E→ E +E | E *E | -E |( E )| id

2 types derivation can happen : Left Most & Right Most Derivation

Example : (Left Most)


Input: -(id+id)

E → -E →-( E ) →-(E+ E ) →- ( id +E ) →- (id + id )

TRY RIGHT MOST!!

01/26/2022 48
Ambiguity
• a grammar that produces more than one parse tree for some
sentence is said to be ambiguous .Put another way, an
ambiguous grammar is one that produces more than one
leftmost derivation or more than one rightmost derivation for
the same sentence
• : The arithmetic expression grammar permits two distinct
leftmost derivations for the sentence id+id*id
• E→ E *E E →E +E
→E+E*E , → id+E
→Id+E*E → id+E*E
→ id+id*E → id+id*E
→ Id+id*id → id+id*id

01/26/2022 49
Ambiguity

01/26/2022 50
LL(1) Grammer
• A predictive parser is a recursive descent parser that does not
require backtracking.
• Predictive parsing is possible only for the class of LL(k) grammars,
which are the context-free grammars for which there exists some
positive integer k that allows a recursive descent parser to decide
which production to use by examining only the next k tokens of input.
• The LL(k) grammars therefore exclude all ambiguous grammars, as
well as all grammars that contain left recursion.

01/26/2022 51
01/26/2022 52
01/26/2022 53
01/26/2022 54
01/26/2022 55
01/26/2022 56
Non-Recursive Predictive Parsing -- LL(1) Parser

• Non-Recursive predictive parsing is a table-driven parser.


• It is a top-down parser.
• It is also known as LL(1) Parser.

input buffer

stack Non-recursive output


Predictive Parser

Parsing Table

01/26/2022 57
Non-Recursive Predictive Parsing -- LL(1) Parser

01/26/2022 58
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.

output
– a production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.

stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S  initial
stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.

parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.

01/26/2022 59
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule XY1Y2...Yk
to represent a step of the derivation.
4. none of the above  error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.

01/26/2022 60
01/26/2022 61
01/26/2022 62
01/26/2022 63
01/26/2022 64
The End

01/26/2022 65

You might also like