You are on page 1of 136

21CS51 – AUTOMATA THEORY AND COMPILER DESIGN

MODULE -3
Context-Free Grammars(CFG)
Syntax analysis Phases of compilers part 1

1
Rewrite system

Rewrite system (or) Production system (or) Rule-based system


• It is defined as a list of rules and an algorithm for applying them.
• Each rule has a left-hand side and a right-hand side
Grammar
• If G is a grammar, let L(G) be the language that G generates.
• Like every rewrite system, every grammar contains a list of rules.
• Also, like every rewrite system, every grammar works with an
alphabet, which we can call V.
• In the case of grammars, we will divide V into two subsets:

• a terminal alphabet, generally called Σ, which contains the


symbols that make up the strings in L(G), and
• a nonterminal alphabet, the elements of which will function as
working symbols. These symbols will disappear by the time the
grammar finishes its job and generates a string.
Grammar
• We will use the symbol → to indicate steps in a derivation.
• So, for example, suppose that G has the start symbol S and the rules
S → aSb, S → bSa, and S → Ꜫ.

Then a derivation could begin with: S →aSb → aaSbb →…


Grammar
• We will use the symbol → to indicate steps in a derivation.
• So, for example, suppose that G has the start symbol S and the rules
S → aSb, S → bSa, and S → Ꜫ.

Then a derivation could begin with: S →aSb → aaSbb →…


Context-Free Grammars and Languages
Regular grammar:
Every rule must:
• Have a left-hand side that is a single nonterminal, and
• Have a right-hand side that is Ꜫ or a single terminal or a single
terminal followed by a single nonterminal.

Context-free grammar(CFG):
• Have a left-hand side that is a single nonterminal, and
• Have a right-hand side.
Context-Free Grammars and Languages
All of the following are allowable context-free grammar rules
S → aSb
S→Ꜫ
T→T
S → aSbbTT
The following are not allowable context-free grammar rules:
ST → aSb
a → aSb
Ꜫ→a
Context-Free Grammars and Languages
Definition of a context-free grammar(CFG):
A context-free grammar G is a quadruple (V, Σ , R, S),

where:
V is the rule alphabet, which contains nonterminals (symbols that are
used in the grammar but that do not appear in strings in the language)
and terminals.
Σ (the set of terminals) is a subset of V.
R (the set of rules) is a finite subset of (V - Σ) x V*, and
S (the start symbol) can be any element of V - Σ.
Context-Free Grammars and Languages
• A language L is context-free iff it is generated by some context-free
grammar G.
• The context-free languages (or CFLs) are a proper superset of the
regular languages
Context-Free Grammars and Languages
• Example:
Context-Free Grammars and Languages
• Example:
Designing Context-Free Grammars
The most important rule to remember in designing a context-free
grammar to generate a language L is the following:
• If L has the property that every string in it has two regions and those
regions must bear some relationship to each other (such as being of
the same length), then the two regions must be generated in tandem.
Otherwise, there is no way to enforce the necessary constraint.
• To generate a string with multiple regions that must occur in some
fixed order but do not have to correspond to each other, use a rule of
the form:
A → BC …
Designing Context-Free Grammars
Example:
Let L = {an bn cm : n, m >= 0}
So let G = ({S, N, C, a, b, c}, {a, b, c}, R, S)
where:
R = { S → NC /* generate the two independent portions.
N → aNb /* generate the an bn portion, from the outside in.
N→Ꜫ
C → cC /* generate the cm portion.
C → Ꜫ }.
Derivations and Parse Trees
The grammatical structure of a string is captured by a parse tree, which
records which rules were applied to which nonterminals during the
string’s derivation.

A parse tree, derived by a grammar G = (V, Σ, R, S), is a rooted, ordered


tree in which:
• Every leaf node is labeled with an element of Σ Ս {Ꜫ},
• The root node is labeled S,
• Every other node is labeled with some element of V - Σ, and
• If m is a nonleaf node labeled X and the children of m are labeled x1,
x2, …, xn, then R contains the rule X → x1, x2, …, xn
Derivations and Parse Trees
Leftmost Derivation:
A Derivation is said to be leftmost derivation, If at each step in a
derivation, the left most variable is replaced by its production.
Eg:
S → aAS | a
A → ab
Derivation:
S → aabS (leftmost variable A is replaced by the production A→ab )
S → aaba (S is placed by S → a)
Derivations and Parse Trees
Rightmost Derivation:
A Derivation is said to be rightmost derivation, If at each step in a
derivation, the right most variable is replaced by its production.
Eg:
S → aAS | a
A → ab
Derivation:
S → aAa (S is placed by S → a)
S → aaba (A is placed by A → ab)
Ambiguity

• Sometimes a grammar may produce more than one parse tree for
some (or all) of the strings it generates.
• When this happens we say that the grammar is ambiguous.
• More precisely, a grammar G is ambiguous if there is at least one
string in L(G) for which G produces more than one parse tree.
Ambiguity
Techniques for Reducing Ambiguity

Remove
• Ꜫ rules like S → Ꜫ.
• rules like S → SS or E → E + E.
In other words recursive rules whose right-hand sides are symmetric
and contain at least two copies of the nonterminal on the left-hand
side.
• Rule sets that lead to ambiguous attachment of optional postfixes
Ambiguous Grammars

• Sometimes a grammar may produce more than one parse tree for
some (or all) of the strings it generates.
• When this happens we say that the grammar is ambiguous.
• More precisely, a grammar G is ambiguous if there is at least one
string in L(G) for which G produces more than one parse tree.
Ambiguous Grammars

How to find,
1. Obtain the leftmost derivation and get a string w, Obtain the
Rightmost derivation and get a string w. Construct parse tree for
both string. If there are two different parse trees, then the grammar
is ambiguous.
2. Obtain the string w by applying leftmost derivation twice and
construct a parse tree. If there are two different parse trees, then
the grammar is ambiguous.
3. Obtain the string w by applying rightmost derivation twice and
construct a parse tree. If there are two different parse trees, then
the grammar is ambiguous.
Ambiguous Grammars

Ex:
Production rule
E=E+E E=E+E E=E*E
E=E–E E = id + E E=E+E*E
E=E*E E = id + E * E E = id + E * E
E=E/E E = id + id * E E = id + id * E
id + id * id E = id + id * id E = id + id * id
Ambiguous Grammars
Is the following grammar ambiguous?
Leftmost Derivation Leftmost Derivation
S → aS | X S → aS S→X
X → aX | a →aaS →aX
→aaaS →aaX
→aaaX →aaaX
→aaaa →aaaa
Ambiguous Grammars
Consider the grammar is ambiguous?
Check with the sentence aabbab

S → aB | bA
A→ aS | bAA | a
B → bS | aBB | b
Ambiguous Grammars
Leftmost derivation is

S → aB (applying S → aB)
aaBB (applying B → aBB)
aabSB (applying B → bS)
aabbAB (applying S → bA)
aabbaB (applying A → a)
aabbab (applying B → b)
Ambiguous Grammars
The same string aabbab can be obtained by applying Leftmost derivation is

S → aB (applying S → aB)
aaBB (applying B → aBB)
aabB (applying B → b)
aabbS (applying B → bS)
aabbaB (applying S → aB)
aabbab (applying B → b)
Syntax analysis Phases of compilers part 1
Outline
1. Introduction-Role of parser
2. Top down parsing
⚫ In compiler, the parser obtains a string of tokens from the
lexical analyser as shown in below figure and verifies that the
string of token names can be generated by the grammar and
source language.

⚫ we expect the parser to report any syntax error in an


intelligible fashion and to recover from commonly occurring
errors to continue processing the remainder of the program.

⚫ The parser constructs a parse tree it to rest of the compiler for


the further processing.
The Role of a Parser
Lexical Rest of the
PARSER
analysis front end
Symbol Table
• The parser continues the process of translation and
validation started by the Lexical Analyzer:
1. Analyzing the phrase structure of the program,
2. Adding hierarchical (tree) structure to the flat stream
of tokens produced by the lexical Analyzer,
3. Outputting a data structure conveying the program's
syntax to subsequent compile phases,
4. Reporting errors when the program does not match
the syntactic structure of the source language
AND
29
⚫ A parser takes as input tokens from the lexical analyser and
treats the token names as terminal symbols of a context free
grammar.

⚫ The parser then construct a parse tree for its input sequence of
tokens, parse tree my be constructed through corresponding
derivation steps.

⚫ There are three types of parsing techniques,

1. Universal parser

2. Top down parser

3. Bottom up Parser.

30
⚫ Universal parser: It uses methods like cocke-younger kasmi
algorithm and Earley algorithm can parse any grammar, but
these method is too inefficient to use in production compliers.

⚫ Top-down parser: This method build parse tree from the


top(root) to the bottom (leaves).

⚫ Bottom up parser: This method build parse from leaves and


work their way up to the root.

In either case, the input to the parser is scanned from left to right
,one symbol at a time.
Parsing Methods
• Three Types of parsing methods in Vogue
1. Universal Parsing method
• Too inefficient to use in any practical compilers ( hence not
discussed any further)
• EX: Coocke–Younger–Kasami (CYK) algorithm
2. Top – Down Parsing
• Can be generated automatically or written manually
• Ex: : Left-to-Right –Top –Down Parser ( A.k.a. LL parser)
3. Bottom – Up parsing
• Can only be generated automatically
• Ex : Left-to-Right –Bottom-Up Parser ( A.k.a. LR Parser)
32
Representation of grammars
For arithmetic parentheized expression can be represent by grammar
as ,
E -> E + T | T
T -> T * F | T|F |F F -> (E) |
id
Above grammar belongs to LR grammar & it is suitable for bottom up parser.
It is not suitable for top down parser because it is left recursive. After it
removes left recursive it look like below Grammar.
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Syntax Error Handling
⚫ Planning the error handling right from the start can both
simplicity the structure of a complier and improves its handling
error.
⚫ Common programming errors can occur at many different
levels are
⚫ Lexical errors
⚫ Syntactic errors
⚫ Semantic errors
⚫ Logical errors

34
Lexical Error: It includes misspelling of identifiers, keywords or
operator.
EX: man→ main
Syntactic Error: It includes misplaced semicolons or extra or
missing braces.
Ex: for(i=o,;i<10;i++), int a:
Semantic Error: It includes type mismatches between operator
and operands.
Ex: return statement in java method with result type void.
Logical Error: It can be anything from incorrect reasoning on
the part of the programmer to use in program.
Ex: if(a=b) →if(a==b)

35
Error handler goals
⚫ Report the presence of errors clearly and accurately
⚫ Recover from each error quickly enough to detect
subsequent errors
⚫ Add minimal overhead to the processing of correct programs.
⚫ It should not slow down the process of compilation.
⚫ They should point the location of error in the source program.
⚫ They should easily understandable by the user.
⚫ They should be specific and localize the problem.
⚫ They should not be redundant.

36
Error-recover strategies
There are 4 error recovery strategies are used to recover error.

⚫ Panic mode recovery

⚫ Phrase level recovery

⚫ Error productions

⚫ Global correction

37
Panic Mode Recovery
⚫ With this method, on discovering an error the parser discard input symbol one
at a time until one of designated set of synchronization tokens is found.

⚫ The synchronization tokens are usually delimiters such as semicolons or },


whose role in the source program is clear and unambiguous.

⚫ The compiler designer must select the synchronizing tokens appropriate for the
source language.

⚫ While panic modes correction often skips a considerable amount of input


checking it for additional errors, it has the advantage of simplicity and it does
not go into infinite loop.
38
Phrase Level Recovery
⚫ Mainly used in top down parsing.

⚫ On discovering an error, a parser may perform local correction on the


remaining input. i.e it may replace a prefix of the remaining input by some string
that allows the parser to continue.

⚫ A typical local correction is to replace a comma by a semicolon, delete an


extraneous semicolon or insert a missing semicolon.

⚫ The choice of the local correction is left to the compiler designer.

⚫ We must carefull to choose replacement that do not lead to infinite loops.

⚫ Its major drawback is to difficulty it has in coping with situation in which


the actual error has occurred before the point of detection.
Error Production
⚫ The common errors that might be encountered , we can augment the
grammar for the language at hand with production that generate the
erroneous constructs.

⚫ A parser constructed from a grammar augmented by these error production


detects the anticipated error when production is used during parsing.

⚫ The parser can generate appropriate error diagnostics about the erroneous
construct that has been recognized in input.
Global Correction
⚫ A compiler to make a few changes as possible in processing an
incorrect input string .
⚫ There are algorithm for choosing a minimal sequence of
changes to obtain a globally least cost correction.
⚫ Given an incorrect input string x and grammar G, these
algorithm will find a parse tree for a related string y, such that
the number of insertion, deletions and changes of tokens
required to transform x into y is as small as possible.
⚫ These methods are in general too costly to implement in terms
of time and space, so these techniques are currently only of
theoretical interest.
Context Free Grammars
⚫ A context-free grammar is a set of recursive rules
used to generate patterns of strings.

⚫ A context-free grammar can describe all regular


languages and more, but they cannot describe all
possible languages.

⚫ Context-free grammars are studied in fields of


theoretical computer science, compiler design, and
linguistics.
Types of Grammar
Type 0 Unrestricted grammar Turing Machine

Context-sensitive Linear-bounded
Type 1
grammar automaton

Context-free
Type 2 Pushdown automaton
grammar

Type 3 Regular grammar Finite state automaton


⚫ Terminals
⚫ Non-terminals
expression -> expression + term
⚫ Start symbol expression -> expression – term
⚫ productions expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id |num
Terminals:

⚫ These are basic symbols from which strings are formed.

⚫ The terminals are called as token name .


⚫ Terminals are component of the token output by the lexical analyser.

Ex: if, else, (, ), id, num

Non-Terminals:

⚫ These are syntactic variable that denote set of strings.

⚫ The set of strings denoted by non-terminals help define the language


generated by the grammar.

Ex: term, factor, expression


Start symbol:
It is also non terminal and the production for the start symbol are listed first.
Ex: expression.
Production:
⚫ The production of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings.
⚫ Each production consists of

❖ A non-terminal called head or left-side of the production; this


production defines some of the string denoted by the head.

❖ The symbol → or some times ::=

❖ A body or right-side consisting of zero or more terminals and


non-terminals . The components of the body describe one way
in which strings of the non-terminal at the head can be
constructed.
Notational Conventions
⚫ These symbols are terminals:
➢ Lowercase letter in alphabet a,b,c or lower case letter late in
alphabet u,v,w,….z represent strings of terminals.

➢ Operator symbols such as +,*,etc.

➢ Punctuation symbol such as parenthesis, comma etc.

➢ Digits are 0,1….9

➢ Boldface string such as if ,else or id.


⚫ These symbols are non-terminals:
➢ Uppercase letter in alphabet A,B,C or upper case letter late in
the alphabet X, Y, Z represent grammar symbols.
➢ The letter S, which ,when it appears it usually start symbol.
➢ Lower case ,italic names such as expr or stmt
➢ Expression – E, Term- T, factor-F
➢ Lowercase Greek letter α, β, γ represent strings of grammar
symbols.
➢ A→ α , Where A is head and α is body.
➢ A→ α1| α2 | α3|…..| αk, in this α1, α2,-- αk are alternatives of
A.
➢ Unless stated otherwise, the head of the first production is start
symbol.
Derivations
⚫ The process of starting with start non-terminal of a grammar
and successively replacing it by the body of one of its
production is called derivation.

⚫ The construction of parser tree can be made by taking a


derivational view in which productions are treated as rewriting
rules to generate a string

⚫ There are two types of derivation.

❖ Rightmost derivations

❖ Leftmost derivations
Leftmost derivation:
⚫ In this , the leftmost non-terminal in each sentential is

always chosen (replced).

⚫ E -> E + E | E * E | -E | (E) | id

⚫ Leftmost derivations for –(id+id)

⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Rightmost derivation(canonical derivation)

In this , the rightmost non-terminal is always replaced.

⚫ Rightmost derivations for –(id+id)

⚫ E => -E => -(E) => -(E+E) => -(E+id)=>-(id+id)


Parse trees
⚫ A parse tree is graphical (picture) representation of a
derivation, in which there is a node for each non-terminal that
appears in the derivation.
⚫ Each interior node of a parse tree represents the application of a
production.
⚫ The interior node is labelled with the non-terminal A in the
head of the production.
⚫ The children of the node are labelled from left to right by the
symbol in the body of production by which this A was replaced
during derivation.
⚫ The leaves of a parse tree are labelled by non-terminals and
read from left to right constitute a sentential form called yield
or frontier of the tree.
Construct Parse tree for input –(id+id)
⚫ -(id+id)
⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
⚫ Depending on Number of Derivation trees, CFGs are sub-
divided into 2 types:
⚫ Ambiguous grammars
⚫ Unambiguous grammars
For ambiguous grammars,
⚫ More than one leftmost derivation and more than one rightmost
derivation exist for at least one string.
⚫ Leftmost derivation and rightmost derivation represents
different parse trees.
For unambiguous grammars,
⚫ A unique leftmost derivation and a unique rightmost derivation
exist for all the strings.
⚫ Leftmost derivation and rightmost derivation represents the
same parse tree.
Ambiguity
⚫ A grammar for which some terminal strings there exist two or
more different parse tree or more than one leftmost derivation
or more than one rightmost derivation is said to be ambiguous.

Ex: The arithmetic expression grammar permits two distinct


leftmost derivation for the sentence id+id*id

E→E+E | E-E |E*E| E|E| (E ) |-E |id


Input: id+id*id
LMD 1:
E→ E+E
→id+E
→id+E*E
→id+id*E
→id+id*id
LMD 2:
E→E*E
→E+E*E
→id+E*E
→id+id*E
→id+id*id
⚫ Above grammar has two different left most derivation for same
input id+id*id , then given grammar is ambiguous grammar.

⚫ * Has higher precedence than +, corresponding to that evaluate


an expression like a+b*c as a+(b*c) rather than (a+b)*c.

⚫ For most parser, it is desirable that the grammar be made


unambiguous, if it is not we cannot uniquely determine which
parse tree to select for a sentence.
⚫ 1. Check whether the given grammar is ambiguous or not.
w = abba.
⚫ S → SS
⚫S → a
⚫S → b

⚫ We get two different parse tree for given input ,then it is


ambiguous grammar.
⚫ 2. Check whether the given grammar is ambiguous or not.
W=ab
⚫S →A / B
⚫ A → aAb / ab
⚫ B → abB / ∈

⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
3. Check whether the given grammar is ambiguous or not-
w = aabbccdd
⚫ S → AB / C
⚫ A → aAb / ab
⚫ B → cBd / cd
⚫ C → aCd / aDd
⚫ D → bDc / bc
⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
⚫ 4. Check whether the given grammar is ambiguous or not.
⚫ w = aab
⚫ S → AB / aaB
⚫ A → a / Aa
⚫B → b

⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
⚫ 5. Check whether the given grammar is ambiguous or not.
⚫ w = abababb
⚫ S → a / abSb / aAb
⚫ A → bS / aAAb

⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
6. Check whether the given grammar is ambiguous or
not. W=id+id*id
⚫E → E + T / T
⚫T → T x F / F
⚫ F → id

Results: Since a unique parse tree exists for the strings,


therefore the given grammar is unambiguous.
7. Check whether the given grammar is ambiguous or not.
w = abab
⚫ S → aSbS / bSaS / ∈

⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
8. Check whether the given grammar is ambiguous or not.
w = ab + a
⚫ R → R + R / R . R / R* / a / b

⚫ Since two different parse trees exist for string w, therefore the
given grammar is ambiguous.
Construction of parse tree
Two different parse tree
Derivation Problem
⚫ Grammar
E → E + E | E * E | (E) | int
⚫ String
int * int + int
Derivation in Detail (1)
E
E
Derivation in Detail (2)
E
E
→ E+E
E + E
Derivation in Detail (3)
E
E
→ E+E
E + E
→ E*E+E

E * E
Derivation in Detail (4)
E
E
→ E+E
E + E
→ E*E+E
→ int * E + E
E * E

int
Derivation in Detail (5)
E
E
→ E+E
E + E
→ E*E+E
→ int * E + E
→ int * int + E E * E

int int
Derivation in Detail (6)
E
E
→ E+E
E + E
→ E*E+E
→ int * E + E
→ int * int + E E * E int
→ int * int + int
int int
The grammar as Ambiguity

The string int + int + int has two


parse trees E E

E + E E + E
E + E int int E + E

int int int int

+ is left-associative
Eliminating Ambiguity
⚫ There exists no general algorithm to remove the ambiguity
from grammar.
⚫ To check grammar ambiguity, we try finding a string that has
more than one parse tree.
⚫ If any such string exists, then the grammar is ambiguous
otherwise not.
⚫ Converting Ambiguous Grammar Into Unambiguous
Grammar-
⚫ Causes such as left recursion, common prefixes etc makes the
grammar ambiguous.
⚫ The removal of these causes may convert the grammar into
unambiguous grammar.
⚫ However, it is not always compulsory.
Methods To Remove Ambiguity
⚫ The ambiguity from the grammar may be removed
using the following methods.
Removing Ambiguity By Precedence & Associativity Rules-
⚫ An ambiguous grammar may be converted into an
unambiguous grammar by implementing-

⚫ Precedence Constraints

⚫ Associativity Constraints

⚫ These constraints are implemented using the following rules.


Rule-01:
The precedence constraint is implemented using the following
rules-
The level at which the production is present defines the priority
of the operator contained in it.

⚫ The higher the level of the production, the lower the priority of
operator.

⚫ The lower the level of the production, the higher the priority of
operator.
Rule 2:

• The associativity constraint is implemented using the following


• rules-

⚫ If the operator is left associative, induce left recursion in its


• production.
⚫ If the operator is right associative, induce right recursion in its production.
⚫ Problem-01:
⚫ Convert the following ambiguous grammar into unambiguous
grammar-
⚫ R → R + R / R . R / R* / a / b
⚫ where * is kleen closure and . is concatenation.
⚫ Solution-
⚫ To convert the given grammar into its corresponding
unambiguous grammar, we implement the precedence and
associativity constraints.
⚫ We have-
⚫ Given grammar consists of the following operators- + , . , *
⚫ Given grammar consists of the following operands- a , b
⚫ The priority order is-
(a , b) > * > . > +
where-
⚫ . operator is left associative
⚫ + operator is left associative
Using the precedence and associativity rules, we write the corresponding
unambiguous grammar as-
⚫ E→E+T/T
⚫ T→T.F/F
⚫ F → F* / G
⚫G→a/b
or
⚫ E→E+T/T
⚫ T→T.F/F
⚫ F → F* / a / b
⚫ Unambiguous Grammar
2. Convert the following ambiguous
grammar into unambiguous
grammar
⚫ Grammar
E → E + E | E * E | (E) | int
⚫ String
int * int + int
Dealing with Ambiguity

⚫ There are several ways to handle ambiguity


⚫ Most direct method is to rewrite the grammar
unambiguously with precedence and associativity
⚫ Enforces precedence of * over +
⚫ Enforces left-associativity of + and *
E→E+T|T
T → T * int | int | ( E )
Without Ambiguity
The int * int + int has only one parse tree now

E E

E + T E * E

T int int E + E

T * int int
int
int
Ambiguity
⚫ Impossible to convert automatically an ambiguous grammar to an unambiguous
one
⚫ Used with care, ambiguity can simplify the grammar
⚫ Sometimes allows more natural definitions
⚫ But we need disambiguation mechanisms
⚫ Instead of rewriting the grammar
⚫ Use the more natural (ambiguous) grammar
⚫ Along with disambiguating declarations
⚫ Most tools allow precedence and associativity declarations to disambiguate
grammars
⚫ Examples …
Associativity Declarations (operator as same
precedence)
⚫ Consider the grammar E → E + E | int
⚫ Ambiguous: two parse trees of int + int + int
E E

E + E E + E

E + E int int E + E

int int int int

• Left-associativity declaration: %left ‘+’


3. Convert the following ambiguous grammar into unambiguous grammar-
bexp → bexp or bexp / bexp and bexp / not bexp / T / F
where bexp represents Boolean expression, T represents True and F represents
False.

Solution:
⚫ To convert the given grammar into its corresponding unambiguous
grammar, we implement the precedence and associativity constraints.
⚫ We have-
⚫ Given grammar consists of the following operators-
or , and , not
⚫ Given grammar consists of the following operands-
T,F
⚫ The priority order is-
(T , F) > not > and > or
where-
⚫ and operator is left associative
⚫ or operator is left associative
⚫ Using the precedence and associativity rules, we write the corresponding
unambiguous grammar as-
⚫ bexp → bexp or M / M
⚫ M → M and N / N
⚫ N → not N / G
⚫ G→T/F
⚫ Unambiguous Grammar
Elimination of ambiguity – Dangling Else
⚫ w=if E1 then if E2 then S1 else S2
Elimination of ambiguity
⚫ Idea:
⚫ A statement appearing between a then and an else
must be matched
Left-Recursion

1. Left-Recursion S → Sa / ∈
2. Right-Recursion S → aS / ∈
3. General-Recursion S → aSb / ∈
A production of grammar is said to have left recursion if the
leftmost variable of its RHS is same as variable of its LHS.
A grammar containing a production having left recursion is
called as Left Recursive Grammar.
Elimination of left recursion
⚫ A grammar is left recursive if it has a non-terminal A
such that there is a derivation A=> Aα
⚫ Top down parsing methods cant handle left-
+
recursive grammars
⚫ A simple rule for direct left recursion elimination:
⚫ For a rule like:
⚫ A -> A α|β
⚫ We may replace it with
⚫ A -> β A’
⚫ A’ -> α A’ | ɛ
General Rule for eliminate left recursion
⚫ Immediate left recursion can be eliminated by the following
technique, which works for any number of A-production.
⚫ Group of production as
⚫ A→A α1| A α2| A α3|……. |A αm| β1|β2|β3|…|βn Where no βi
begins with an A,
• Then replace the A production by
⚫ A→β1A’|β2A’|……….|βnA’
⚫ A’→ α1A’| α2A’| α3A’|…… αmA’| ɛ
Left-Recursion-problems
Problem-1: Consider the following grammar and eliminate left recursion-
A → AAα / β
Solution:
The grammar after eliminating left recursion is-
A → βA’
A’ → AαA’ / ∈
Problem-2: Consider the following grammar and eliminate left recursion-
S → S0S1S / 01
Solution:
The grammar after eliminating left recursion is-
S → 01A
A → 0S1SA / ∈
Problem-3: Consider the following grammar and eliminate left recursion-
A → ABd / Aa / a
B → Be / b
Solution:
The grammar after eliminating left recursion is-
A → aA’
A’ → BdA’ / aA’ / ∈
B → bB’
B’ → eB’ / ∈
Problem-4: Consider the following grammar and eliminate left recursion-
E→E+E/ExE/a
Solution:
The grammar after eliminating left recursion is-
E → aA
A → +EA / xEA / ∈
5. Eliminate the left recursion for following grammar.
A→Aa|b
B→Bm|Bn|c
6. Eliminate the left recursion for following grammar.
L→L,S| S
S→(L ) |a
7. Eliminate the left recursion for following grammar.
S→Ra|Aa|a
R→ab
A→AR|AR|b
T→Tb|a
8. Eliminate the left recursion for following grammar.
E→E+T|T
T→T*F|F
F→( E ) |id

9. Eliminate the left recursion for following grammar.


S→S+S|SS|( S ) |S* |a

10. Eliminate the left recursion for following grammar.


E→EAE |( E) |id
A→+ | - | * | ^
Left factoring
⚫ Left factoring is a grammar transformation that is
useful for producing a grammar suitable for predictive
or top-down parsing.
⚫ Consider following grammar:
⚫ Stmt -> if expr then stmt else stmt
⚫ | if expr then stmt
⚫ On seeing input if it is not clear for the parser which
production to use
⚫ We can easily perform left factoring:
⚫ If we have A->αβ1 | αβ2 then we replace it with
⚫ A -> αA’
⚫ A’ -> β1 | β2
Algorithm for Left factoring
⚫ Input: Grammar G
⚫ Output : An equivalent left-factored grammar.
⚫ Method: For each non-terminal A, find the longest
prefix α common to two or more of its alternatives. If
α<> ɛ, then replace all of A-productions
⚫ A->αβ1 |αβ2 | … | αβn | γ by
⚫ A -> αA’ | γ
⚫ A’ -> β1 |β2 | … | βn
⚫ Example:
⚫ S -> I E t S | i E t S e S | a
⚫ E -> b
⚫ Problem-01:

⚫ Do left factoring in the following grammar-
⚫ S → iEtS / iEtSeS / a
⚫E→b

⚫ Solution-

⚫ The left factored grammar is-


⚫ S → iEtSS’ / a
⚫ S’ → eS / ∈
⚫E→b
⚫ Problem-02:
⚫ Do left factoring in the following grammar-
⚫ A → aAB / aBc / aAc
⚫ Solution-
⚫ Step-01:
⚫ A → aA’
⚫ A’ → AB / Bc / Ac
⚫ Again, this is a grammar with common prefixes.
⚫ Step-02:
⚫ A → aA’
⚫ A’ → AD / Bc
⚫D → B / c
⚫ This is a left factored grammar.
⚫ Problem-03:
⚫ Do left factoring in the following grammar-
⚫ S → a / ab / abc / abcd
⚫ Solution-
⚫ Step-01:
⚫ S → aS’
⚫ S’ → b / bc / bcd / ∈
⚫ Again, this is a grammar with common prefixes.
⚫ Step-02:
⚫ S → aS’
⚫ S’ → bA / ∈
⚫ A → c / cd / ∈
⚫ Again, this is a grammar with common prefixes.
⚫ Step-03:
⚫ S → aS’
⚫ S’ → bA / ∈
⚫ A → cB / ∈
⚫ B→d/∈
⚫ This is a left factored grammar.

⚫ Problem-04:
⚫ Do left factoring in the following grammar-
⚫ S → aAd / aB
⚫ A → a / ab
⚫ B → ccd / ddc
⚫ Solution-
⚫ The left factored grammar is-
⚫ S → aS’
⚫ S’ → Ad / B
⚫ A → aA’
⚫ A’ → b / ∈
⚫ B → ccd / ddc
⚫ Problems
⚫ Do left factoring in the following grammar-
1. E→E sub E sup E | E sub E | E sup E|{E} |C
2. E→I :=e | I ( e) | a
3. E→T+E | T |a
T→ id * T | id
4. S→ ad |a | ab| abc| b
5. S→SS+ | SS- |SS* | a
6. S→bAd |bAe|ed
A→e|bA
7. S→0S1 | 01
8. lexp→ atom | list
atom→number | identifier
list →(lexp-seq)
lexp-seq→lexp, lexp-seq | lexp
9. Declaration → type var-list
type→ int |float
var-list → identifier, var-list | identifier
10. Statement →assign_stmt | call_stmt | other
assign_stmt→ identifier := exp
call_stmt→ identifier (exp-list)
PARSING
• Constructs the syntax tree for a given sequence of
tokens using grammar rules and productions such
that:
1. The top node is labeled with the start symbol of the
grammar
2. Leaf nodes are labeled with terminals and inner nodes
with nonterminals
3. the children of the inner node labeled N correspond to the
members of an alternative of N, in the same order as they
occur in that alternative
4. The terminals labeling the leaf nodes correspond to the
sequence of tokens, in the same order they occur in the
input
………. Contd.
107
Parsing Methods
• Three Types of parsing methods in Vogue
1. Universal Parsing method
• Too inefficient to use in any practical compilers ( hence
not discussed any further)
• EX: Coocke–Younger–Kasami (CYK) algorithm
2. Top – Down Parsing
• Can be generated automatically or written manually
• Ex: : Left-to-Right –Top –Down Parser ( A.k.a. LL parser)
3. Bottom – Up parsing
• Can only be generated automatically
• Ex : Left-to-Right –Bottom-Up Parser ( A.k.a. LR Parser)
108
Top – Down & Bottom – Up Methods
• While they are efficient;
• They can work only on restricted classes of Grammars,
( such as LL and LR Grammars )
BUT
• They are expressive enough to describe most syntactic
constructs in Programming Languages.
SO
• Care must be taken while defining the grammar for the
language to make sure that the grammar is
unambiguous
109
TOP – DOWN Parsing
• An attempt to construct a parse tree for an input
string from the root and creating the nodes in PRE-
ORDER (i.e. the top of the tree is constructed before
its nodes)
• Can also be viewed as an attempt to find a leftmost
derivation for an input string
• The way it works:
1. The process starts at root node say N
2. Repeat until the fringe of the parse tree matches the input
string
1. Determine the correct alternative for N The key Step
2. Parser then proceeds to construct the left child of N
3. The process of determining the correct alternative for the
leftmost child continues till the left most child is a terminal

110
Top Down parsing example
String to be parsed :array [ num dotdot num] of integer
type

array [ simple ] of type


num dotdot num simple

Consider the Grammar: integer


type → simple | id | array [simple] of
type Token dotdot “..” is used to
simple → integer stress that character sequence
111 | char is treated as unit
| num dotdot num
Top Down Parsing – Another Case
• Consider the grammar : s → cAd A → ab | a
• String to be parsed : W = c a d s
◼Parse tree Construction would be:
1. Start with a single node S c A d
2. Use first production to expand S
3. Expand the nonterminal A using the first a b
alternative from rule 2 as we have a s
match for input token a
4. But b does not match d ; c A d
5. We have to BACKTRACK to step 3 and
try second alternative from rule 2
112 a
Top Down Parsing - Observations
• In general,
1. The selection of a production for a nonterminal may
involve trial – and – error
2. We may have to try a production and backtrack to
try another production if the first is found to be
unsuitable.
(A production is unsuitable ,if after using the
production, we can not complete the tree to match
the input string )
• If we can make the correct choice by looking at just
the next input symbol, we can build a Predictive
Parser that can perform a top-down parse without
backtracking

113
Top Down Parsing - Observations
• In general,
1. The selection of a production for a nonterminal may
involve trial – and – error
2. We may have to try a production and backtrack to try
another production if the first is found to be
unsuitable.
(A production is unsuitable ,if after using the
production, we can not complete the tree to match
the input string )
• If we can make the correct choice by looking at just
the next input symbol, we can build a Predictive
Parser that can perform a top-down parse without
backtracking
114
Predictive Parsing
• Most common technique used in parsers (Particularly TRUE in case of
Parsers written manually)
• Use grammars that are carefully written to eliminate left recursion
• Further, if many alternatives exist for a nonterminal, the grammar is
so defined that the input token will uniquely define the one and only
one alternative and that will lead to correct parsing or Error.
• No question of backtracking and trying another alternative

115
Predictive Parsing Example
stmt → if expr then stmt else stmt
Consider the grammar: | while expr do stmt
| begin stmt_list end
A parser for this grammar can be
written with the following simple
structure: switch(gettoken())
Based only on the first token, the {
case if:
parser knows which rule to use to ….
break;
derive a statement, because all the case while:
….
three outcomes are unique even at break;
case begin:
the first letter. ….
break;
default:
Therefore this is called a reject input;
}
Predictive Parser.
116
Parsing Table
• Represents a Grammar as
Grammar:
• A two dimensional array M [ A,α ], where:
• A is the nonterminal and E → TE’
• α is the input symbol E’ → +TE’ | 
T → FT’
• Used in Parser as the reference table to T’ → FT’ | 
decide the possible values ( thereby the F → ( E ) | id
choice of production to be used)
NON- INPUT SYMBOL
TERMINAL id + * ( ) $
E E → TE’ E → TE’
Parsing E’ E’ → +TE’ E’ →  E’ → 
T T → FT’ T → FT’
Table: T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)

117
A table Driven Predictive Parser
Program
• An input Buffer (containing string to be parsed with $ in the end)
• A stack ( containing a sequence of grammar symbols with $ at the bottom)
• A Parsing table and
• An Output stream
INPUT: id + id  id $
OUTPUT:
E
STACK: E
T
Predictive Parsing
E’
$ Program T E’
$

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
PARSING E E → TE’ E → TE’
E’ E’ → +TE’ E’ →  E’ → 
TABLE: T T → FT’ T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)

118
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
TE Program
$
E’
$
NON- INPUT SYMBOL
TERMINAL id + * ( ) $
PARSING E E → TE’ E → TE’
E’ E’ → +TE’ E’ →  E’ → 
TABLE: T T → FT’ T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
119 F F → id F → (E)
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
F
T Program F T’
T’
E’
E’
$
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE E→E→TE’TE’ E → TE’
E → TE’
E’ E’ E’ → → +TE’
E’+TE’ E’ →  E’E’→→ E’ → 
TABLE: TT T→T→FT’ FT’ T → FT’
T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
120 F F → id F → (E)
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
id
F Program F T’
T’
E’ id
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE E→E→TE’TE’ E → TE’
E → TE’
E’ E’ E’ → → +TE’
E’+TE’ E’ →  E’E’→→ E’ → 
TABLE: TT T→T→FT’ FT’ T → FT’
T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
121 F F → id F → (E)
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
Action when Top(Stack) = ε : Pop stack
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
ε
T’ Program F T’
E’
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE E→E→TE’TE’ E → TE’
E → TE’
E’ E’ E’ → → +TE’
E’+TE’ E’ →  E’E’→→ E’ → 
TABLE: TT T→T→FT’ FT’ T → FT’
T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
122 F F → id F → (E)
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
E’
+ F T’ + T E’
Program
T$
E’ id ε
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE E→E→TE’TE’ E → TE’
E → TE’
E’ E’ E’ → → +TE’
E’+TE’ E’ →  E’E’→→ E’ → 
TABLE: TT T→T→FT’ FT’ T → FT’
T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
123 F F → id F → (E)
A Predictive Parser
▪ The Working of the Table driven Predictive Parser
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
T F T’ + T E’
E’ Program
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE E→E→TE’TE’ E → TE’
E → TE’
E’ E’ E’ → → +TE’
E’+TE’ E’ →  E’E’→→ E’ → 
TABLE: TT T→T→FT’ FT’ T → FT’
T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
T’ T’→  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
124 F F → id F → (E)
A Predictive Parser
The predictive parser proceeds in this fashion emitting the
following productions:
T → FT’ E
T E’
F → id
F T’ + T E’
T’ →  FT’
id  F T’
F → id 
id  F T’
T’ → 
E’ → 
id 

When Top(Stack) = input = $ the parser halts & accepts the input
125 string – ( id + id * id )
Algorithm for Predictive parser
• The program execution is controlled by TWO inputs
• The input symbol a and the symbol on the top of the stack X
• There are THREE possibilities for the parser
• If X = a = $ : halt & announce the successful completion of
parsing
• If X = a ≠ $ : Pop X off the stack and advance the input pointer to
next input symbol
• If X is a nonterminal : Consult entry M [ X, a ] in the Parsing
table M and replace X on top of the stack with the RHS of the
production
• If M [ X, a ] is error call error routine

126
Algorithm for Predictive parser
• The program execution is controlled by TWO inputs
• The input symbol a and the symbol on the top of the
stack X
• There are THREE possibilities for the parser
• If X = a = $ : halt & announce the successful completion
of parsing
• If X = a ≠ $ : Pop X off the stack and advance the input
pointer to next input symbol
• If X is a nonterminal : Consult entry M [ X, a ] in the
Parsing table M and replace X on top of the stack with
the RHS of the production
• If M [ X, a ] is error call error routine
127
FIRST & FOLLOW
• The two functions represent a set of terminal symbols,
that aid us in constructing Predictive parser by helping
us fill the Parsing Table with valid entries
• Set of terminal symbols yielded by FOLLOW function
can also be used in error recovery
• FIRST – if α is the string of grammar symbols then
FIRST (α) is the set of terminals that begin the strings
derived from α (If α *> ε , then ε is also in FIRST (α))
• FOLLOW (A) – the set of terminals a that can appear
immediately to the right of A such that there exists a
derivation of the form S *> α A a β for some α and β

128
FIRST & FOLLOW (Contd.)
• The set of terminal symbols ( including ε )that can
appear at the far left of any parse tree derived from a
particular nonterminal is the FIRST set of that
nonterminal
• The set of terminal symbols ( including ε ) That can
follow a nonterminal in some derivation or the other,
is called the FOLLOW set of that nonterminal
• A set of rules are followed to compute the FIRST and
FOLLOW set
• These sets will be used in creating Parsing table

129
Rules to Create FIRST
FIRST Rules: GRAMMAR:
1. If X is a terminal, FIRST(X) = {X}
2. If X →  , then   FIRST(X)
E → TE’
3. If X → aABC , then a  FIRST(X)
E’ → +TE’ | 
T → FT’
4. If X →ABCD, then FIRST (A)  FIRST (X).
4a. Further, If A →  , in the above production, then
T’ → FT’ | 
FIRST (B)  FIRST (X)........ [& so on recursively]
F → ( E ) | id

FIRST(id) = {id} FIRST(E’) = {}{+, }


FIRST() = {} FIRST(T’) = {}{, }
SETS: FIRST(+) = {+} FIRST(F) = {(, id}
FIRST(() = {(} FIRST(T) = FIRST(F) = {(, id}
130 FIRST()) = {)} FIRST(E) = FIRST(T) = {(, id}
FIRST(E’) = {+, }
FIRST(T’) = { , } Rules to Create FOLLOW
FIRST(F) = {(, id} FOLLOW rules:
FIRST(T) = {(, id}
FIRST(E) = {(, id} 1. If S is the start symbol, then $  FOLLOW(S)

GRAMMAR: 2. If there is a production A → B, then,


E → TE’ EVERYTHING in FIRST() except  is placed
E’ → +TE’ |  in FOLLOW(B)
T → FT’
3. If there is a production A → B then,
T’ → FT’ | 
EVERYTHING in FOLLOW(A) is in FOLLOW(B)
F → ( E ) | id
SETS:
FOLLOW(E) = {$} { ), $} 3a. If there is a production A → B where
FOLLOW(E’) = { ), $} FIRST() contains  ( i.e.  
*  ) then,

FOLLOW(T) = { ), $} {+, ), $} EVERYTHING in FOLLOW(A) is in FOLLOW(B)


FOLLOW(T’) = { +, ), $}
FOLLOW(F) = {+, ), $} A and B are non-terminals,
131
{+, , ), $}  and  are strings of grammar symbols
FOLLOW
Rules to Build Parse Table SETS:
E → TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
FIRST FIRST(T’) = { , }
GRAMMAR: E’ → +TE’ |  FOLLOW(E’) = { ), $}
T → FT’ SETS: FIRST(F) = {(, id} FOLLOW(T) = {+, ), $}
T’ → FT’ |  FIRST(T) = {(, id}
FOLLOW(T’) = {+, ), $}
F → ( E ) | id FIRST(E) = {(, id}
Rules FOLLOW(F) = {+, , ), $}

1. If A → , :if a  FIRST(), add A →  to M[A, a]


2. If A → , if   FIRST(), add A →  to M[A, b] for each terminal
b  FOLLOW(A),
3.If A → , if   FIRST(), and $  FOLLOW(A), A →  to M[A, $]
NON- INPUT SYMBOL
TERMINAL id + * ( ) $
PARSING E E → TE’ E → TE’
E’ E’ → +TE’ E’ →  E’ → 
TABLE: T T → FT’ T → FT’
T’ T’→  T’ → *FT’ T’ →  T’ → 
132 F F → id F → (E)
Predictive Parsing
• A top down method of parsing (Recursive descent
parsing ) in which a set of procedures are executed
recursively to process the input.
• The speciality of predictive parsing is that :
The look ahead symbol unambiguously determines the
procedure selected for each nonterminal
(No Prediction in fact !!!)
• Hence:
The sequence of procedures called implicitly defines a
parse tree for the input.

133
Algorithm for Predictive Parsing
procedure match (c : token);
{
if ( lookahead ==c) then lookahead := nexttoken;
else error;
}
procedure type;
{if ( lookahead is [ integer, char, num ] ) then simple;
else if (lookahead == ‘ ‘ ) { match (‘‘); match (id); }
else if ( lookahead == array )
then { match (array); match(‘[‘); simple;
match(‘]’); match(of); type; }
else error; }

………. Contd.
134
Algorithm for Predictive parsing
procedure simple;
{
if ( lookahead == integer ) match (integer);
else if ( lookahead == char ) match (char );
else if ( lookahead == num)
{ match (num ); match (dotdot); match( num );}
else error;
}
135
Error Recovery in Predictive Parsing
• Error is encountered by Predictive Parser when:
• The terminal on the stack does not match the next input
symbol
• The nonterminal A is on the top of the stack, A is the input
symbol and the parsing table is empty for entry M [A, a]
• Two methods used for recovery
1. Panic – Mode error recovery
• Skip symbols on the input until a token in a selected set of
synchronizing tokens appears
• Effectiveness depends on set of synchronizing tokens chosen
2. Phased – level recovery
• Fill empty slots of parsing table with pointers to error Routines

136

You might also like