Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq

Compiler Construction
CS-4207
Instructor Name: Atif Ishaq

Lecture 8-9
Today’s Lecture
 Role of Parser
2
Role of Parser
token
Source Lexical
Parse tree Rest of Front Intermediate
Parser
program Analyzer End representation
getNext
Token
Symbol
table
3
Role of Parser
 A grammar gives a precise, yet easy to understand , syntactic specification of a
programming language
 The role of the parsing process is to reconstruct a derivation by which a given
Context Free Grammar can generate a given input string
 Equivalently, construct the parsing tree which represents the derivation
4
The Role of Parser
 A Parser implements a Context Free Grammar as a recognizer of strings

 The role of parser in compiler is two fold
 To check syntax (string recognizer)
 Report any syntax errors accurately
 Recover from commonly occurring error to continue processing
 To invoke semantic actions
 For static semantics checking, e.g. type checking of expressions, functions,
etc.
 For syntax-directed translation of the source code to an intermediate
representation
5
Syntax Directed Translation
 One of the major role of parser is to produce an intermediate representation

(IR) of the source program using syntax directed translation
 Possible IR outputs
 Abstract Syntax Tree
 Control Flow Graphs (CFGs) with triples, three address code, or register
transfer list notation
6
Representative Grammar
 Construct that begins with keywords are easy to parse

 Expression presents more challenges, because of associativity and precedence
of operators
Recap of Grammar
 Context Free Grammar is 4 tuple – G (V,T,P,S)
1. T is finite set of tokens (terminal symbols)
2. V is a finite set of nonterminal
3. P is a finite set of productions of the form   
where   V and   (VT)*
4. S  V is a designated start symbol
7
Notation Conventions
 Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +
 Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt
 Grammar symbols
X,Y,Z  (VT)
 Strings of terminals
u,v,w,x,y,z  T*
 Strings of grammar symbols

,,  (VT)*
8
Chomsky Hierarchy : Language Classification
 A grammar G is said to be
 Regular if it is right linear where each production is of the form
AwB or Aw
or left linear where each production is of the form

A  B wor Aw
 Context free if each production is of the form
A
where A  N and   (VT)*
 Context sensitive if each production is of the form
A
where A  V, ,,  (VT)*, || > 0
 Unrestricted 9
L(regular)  L(context free)  L(context sensitive)  L(unrestricted)
Where L(T) = { L(G) | G is of type T }

That is: the set of all languages
generated by grammars G of type T
Examples:
Every finite language is regular! (construct a FSA for strings in L(G))
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive
10
Venn Diagram of Grammar Types:
Type 0 – Phrase-structure (Unrestricted)

Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
11
Derivation
 The one-step derivation is defined by
A
where A   is a production in the grammar
 In addition, we define
  is leftmost lm if  does not contain a nonterminal
  is rightmost rm if  does not contain a nonterminal
 Transitive closure * (zero or more steps)
 Positive closure + (one or more steps)
 The language generated by G is defined by
L(G) = {w  T* | S + w}
12
Derivation - Example
Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
productions P = E  E + E
EE*E
E(E)
E-E
E  id
Example derivations:
E  - E  - id
E rm E + E rm E + id rm id + id
E * E
E * id + id
E + id * id + id 13
Recursive Decent Parsing
 Top down parsing builds tree from top to bottom
 Each non terminal corresponds to one recursive procedure
 The procedure recognizes a prefix of the input that can be generated by the
corresponding non-terminal, it consumes the prefix, and returns a parse tree for
the non terminal
14
General Structure
 Each right hand side of a production provides part of the body of the function
 Each non terminal on the right hand side is translated into a call to the function
(procedure) that recognizes that non terminal
 Each terminal on the right hand side is translated into a call to the lexical
scanner. Failure If the resulting token is not the expected terminal
 Each recognizing function return a tree framgment
15
Complication : Left Recursion
 A grammar can not be left-recursive
 Original schemes leads to an infinite loop : grammar is inappropriate for
recursive decent
16
Eliminating Left Recursion
17
Left Recursion
 Production of the form

AA| |
are left recursive
When one of the productions in a grammar is left recursive then a
predictive parser loops forever on certain inputs
A grammar is left recursive if it has a non-terminal A such that there is a

derivation A ⇒ Aα
Top down parsing methods cant handle left-recursive grammars

A simple rule for direct left recursion elimination:
For a rule like:
A → A α|β
We may replace it with
A → β A'
A' → α A' | ɛ
18
Left Recursion
General form (with left recursion):

A → A 1 | A 2 | ... | A n | 1 | 2 | ... | m
After transformation:
==> A → 1 A' | 2 A' | ... | m A'
A' → 1 A' | 2 A' | ... | n A' | 
How about left recursion occurred for derivation with more than two steps?
e.g., S → Aa | b A → Ac | Sd | ε
Now, S → Aa | b
A →Ac | Aad | bd | ε
After transformation:
19
Left Recursion Elimination Method
Input: Grammar G with no cycles or -productions

Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai  Aj 
with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
20
Left Recursion involving several Non-Terminal
21
Further Complications
22
Error Handling
 A good compiler should assist in identifying and locating errors

1. Lexical Error : misspelling of identifiers, keywords or operators – e.g. the
use of identifier spel instead of spell – and missing quotes around text
intended as string
2. Syntactic Error : misplaced semicolons or extra or missing braces
3. Static Semantic Error : type mismatch between operators and operands –
return a value to void return type also fall in this category
4. Dynamic Semantic Error : hard or impossible to detect at compile time,
runtime checks are required
5. Logical Error : can be anything from incorrect reasoning on the part of
programmer – use of assignment operator in place of comparison operator
23
Error Recovery Strategies
Panic-Mode Recovery
 Discards input symbols one at a time until one of the designated set of
synchronized token (such as semicolon or }) is found
 Skips a considerable amount of input without checking it for additional
errors
Phrase Level Recovery
 Perform local correction on the remaining input – may replace prefix of
remaining input with some string
 Typical local correction is to replace a comma by a semicolon, delete an
extraneous semicolon, or insert a missing semicolon
Error Production
 Augment (Expand) grammar with production for erroneous constructs
24
Error Recovery Strategies
Global Correction
 Choose a minimal sequence of changes to obtain a global least-cost correction
 Given an incorrect input symbol x for grammar G try to find a parse tree for a
related string y – transform x into y
 Unfortunately, these methods are in general too costly to implement in terms

of time and space, so these techniques are currently only of theoretical
interest.
25
Types of Parser
 Three general types of parser

1. Universal
• Not efficient now
2. Top-down
 Build parse tree from top (root) to bottom (leaves)
 Recursive decent (predictive parsing)
 LL (left to right, left most derivation method)
3. Bottom-up
• Starts from leaves and works their way up to root
• Operator precedence parsing
• LR (left to right, right most derivation) method
Importantly in all cases the input to the parser is scanned from left to right,
one symbol at a time 26
Top Down Parsing
 A top-down parser starts with the root of the parse tree. The root node is
labeled with the goal (start) symbol of the grammar. The top-down parsing
algorithm proceeds as follows:
1. Construct the root node of the parse tree
2. Repeat until the fringe of the parse tree matches input string
a. At a node labeled A, select a production with Aon its lhs
b. for each symbol on its rhs, construct the appropriate child
c. When a terminal symbol is added to the fringe and it does not match
the fringe, backtrack

27
Top Down Parsing
 LL Methods (Left to right . Left most derivation) and recursive decent parsing
Grammar: Leftmost derivation:

ET+T E lm T + T
T(E) lm id + T
T-E lm id + id
T  id
28
Top Down Parsing : Backtracking Problem
1. Left recursion
 can cause a top-down parser to go into infinite loop
 A grammar is said to be left recursive if it has a nonterminal A such that
there is a derivation A ⇒ A α for some  .
2. Backtracking
 It undo not only the movement but also the semantic entry in the symbol
table
3. The order the alternatives are tried
29
Ambiguity
 More than one parse tree (For some strings)
 More than one left most derivation

+
 More than one rightmost derivation
 Example : id + id * id
EE+E EE*E
 id + E E+E*E
 id + E * E  id + E * E
 id + id * E  id + id * E
 id + id * id  id + id * id
30
Eliminating Ambiguity
31
 The grammar is ambiguous as the below given string generates two different
parse trees
Two different parse trees
32
Preferred
 General idea to eliminate ambiguity
 A statement appearing between a then and an else must be matched
33
34

Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq

Uploaded by

Copyright:

Available Formats

Compiler Construction

Instructor Name: Atif Ishaq

 A grammar gives a precise, yet easy to understand , syntactic specification of a

 The role of the parsing process is to reconstruct a derivation by which a given

Context Free Grammar can generate a given input string

 Equivalently, construct the parsing tree which represents the derivation

 A Parser implements a Context Free Grammar as a recognizer of strings

 One of the major role of parser is to produce an intermediate representation

 Construct that begins with keywords are easy to parse

 Strings of grammar symbols

or left linear where each production is of the form

 Context free if each production is of the form

 Context sensitive if each production is of the form

L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Where L(T) = { L(G) | G is of type T }

Venn Diagram of Grammar Types:

Type 0 – Phrase-structure (Unrestricted)

 The one-step derivation is defined by

where A   is a production in the grammar

  is leftmost lm if  does not contain a nonterminal

  is rightmost rm if  does not contain a nonterminal

 Transitive closure * (zero or more steps)

 Positive closure + (one or more steps)

 The language generated by G is defined by

Grammar G = ({E}, {+,*,(,),-,id}, P, E) with

E rm E + E rm E + id rm id + id

 Top down parsing builds tree from top to bottom

 Each non terminal corresponds to one recursive procedure

the non terminal

(procedure) that recognizes that non terminal

scanner. Failure If the resulting token is not the expected terminal

 Each recognizing function return a tree framgment

 A grammar can not be left-recursive

 Original schemes leads to an infinite loop : grammar is inappropriate for

 Production of the form

A grammar is left recursive if it has a non-terminal A such that there is a

Top down parsing methods cant handle left-recursive grammars

General form (with left recursion):

Input: Grammar G with no cycles or -productions

 A good compiler should assist in identifying and locating errors

 Choose a minimal sequence of changes to obtain a global least-cost correction

 Unfortunately, these methods are in general too costly to implement in terms

 Three general types of parser

algorithm proceeds as follows:

1. Construct the root node of the parse tree

a. At a node labeled A, select a production with Aon its lhs

b. for each symbol on its rhs, construct the appropriate child

the fringe, backtrack

Grammar: Leftmost derivation:

 can cause a top-down parser to go into infinite loop

 A grammar is said to be left recursive if it has a nonterminal A such that

there is a derivation A ⇒ A α for some  .

3. The order the alternatives are tried

 More than one parse tree (For some strings)

 More than one left most derivation

Two different parse trees

 General idea to eliminate ambiguity

 A statement appearing between a then and an else must be matched

You might also like