You are on page 1of 34

Compiler Construction

CS-4207

Instructor Name: Atif Ishaq


Lecture 8-9
Today’s Lecture

 Role of Parser

2
Role of Parser

token
Source Lexical
Parse tree Rest of Front Intermediate
Parser
program Analyzer End representation
getNext
Token

Symbol
table

3
Role of Parser

 A grammar gives a precise, yet easy to understand , syntactic specification of a

programming language

 The role of the parsing process is to reconstruct a derivation by which a given

Context Free Grammar can generate a given input string

 Equivalently, construct the parsing tree which represents the derivation

4
The Role of Parser

 A Parser implements a Context Free Grammar as a recognizer of strings


 The role of parser in compiler is two fold
 To check syntax (string recognizer)
 Report any syntax errors accurately
 Recover from commonly occurring error to continue processing
 To invoke semantic actions
 For static semantics checking, e.g. type checking of expressions, functions,
etc.
 For syntax-directed translation of the source code to an intermediate
representation

5
Syntax Directed Translation

 One of the major role of parser is to produce an intermediate representation


(IR) of the source program using syntax directed translation
 Possible IR outputs
 Abstract Syntax Tree
 Control Flow Graphs (CFGs) with triples, three address code, or register
transfer list notation

6
Representative Grammar

 Construct that begins with keywords are easy to parse


 Expression presents more challenges, because of associativity and precedence
of operators
Recap of Grammar
 Context Free Grammar is 4 tuple – G (V,T,P,S)
1. T is finite set of tokens (terminal symbols)
2. V is a finite set of nonterminal
3. P is a finite set of productions of the form   
where   V and   (VT)*
4. S  V is a designated start symbol

7
Notation Conventions
 Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +

 Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt

 Grammar symbols
X,Y,Z  (VT)

 Strings of terminals
u,v,w,x,y,z  T*

 Strings of grammar symbols


,,  (VT)*

8
Chomsky Hierarchy : Language Classification
 A grammar G is said to be
 Regular if it is right linear where each production is of the form

AwB or Aw

or left linear where each production is of the form


A  B wor Aw

 Context free if each production is of the form

A
where A  N and   (VT)*

 Context sensitive if each production is of the form

A
where A  V, ,,  (VT)*, || > 0

 Unrestricted 9
Chomsky Hierarchy : Language Classification

L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Where L(T) = { L(G) | G is of type T }


That is: the set of all languages
generated by grammars G of type T

Examples:
Every finite language is regular! (construct a FSA for strings in L(G))
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive

10
Chomsky Hierarchy : Language Classification

Venn Diagram of Grammar Types:

Type 0 – Phrase-structure (Unrestricted)


Type 1 –
Context-Sensitive
Type 2 –
Context-Free

Type 3 –
Regular

11
Derivation

 The one-step derivation is defined by

A

where A   is a production in the grammar

 In addition, we define

  is leftmost lm if  does not contain a nonterminal

  is rightmost rm if  does not contain a nonterminal

 Transitive closure * (zero or more steps)

 Positive closure + (one or more steps)

 The language generated by G is defined by

L(G) = {w  T* | S + w}
12
Derivation - Example

Grammar G = ({E}, {+,*,(,),-,id}, P, E) with

productions P = E  E + E
EE*E
E(E)
E-E
E  id

Example derivations:

E  - E  - id

E rm E + E rm E + id rm id + id

E * E

E * id + id

E + id * id + id 13
Recursive Decent Parsing

 Top down parsing builds tree from top to bottom

 Each non terminal corresponds to one recursive procedure

 The procedure recognizes a prefix of the input that can be generated by the

corresponding non-terminal, it consumes the prefix, and returns a parse tree for

the non terminal

14
General Structure

 Each right hand side of a production provides part of the body of the function

 Each non terminal on the right hand side is translated into a call to the function

(procedure) that recognizes that non terminal

 Each terminal on the right hand side is translated into a call to the lexical

scanner. Failure If the resulting token is not the expected terminal

 Each recognizing function return a tree framgment

15
Complication : Left Recursion

 A grammar can not be left-recursive

 Original schemes leads to an infinite loop : grammar is inappropriate for

recursive decent

16
Eliminating Left Recursion

17
Left Recursion

 Production of the form


AA| |
are left recursive
When one of the productions in a grammar is left recursive then a
predictive parser loops forever on certain inputs

A grammar is left recursive if it has a non-terminal A such that there is a


derivation A ⇒ Aα

Top down parsing methods cant handle left-recursive grammars


A simple rule for direct left recursion elimination:
For a rule like:
A → A α|β
We may replace it with
A → β A'
A' → α A' | ɛ

18
Left Recursion

General form (with left recursion):


A → A 1 | A 2 | ... | A n | 1 | 2 | ... | m
After transformation:
==> A → 1 A' | 2 A' | ... | m A'
A' → 1 A' | 2 A' | ... | n A' | 

How about left recursion occurred for derivation with more than two steps?
e.g., S → Aa | b A → Ac | Sd | ε
Now, S → Aa | b
A →Ac | Aad | bd | ε
After transformation:

19
Left Recursion Elimination Method

Input: Grammar G with no cycles or -productions


Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai  Aj 
with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo

20
Left Recursion involving several Non-Terminal

21
Further Complications

22
Error Handling

 A good compiler should assist in identifying and locating errors


1. Lexical Error : misspelling of identifiers, keywords or operators – e.g. the
use of identifier spel instead of spell – and missing quotes around text
intended as string
2. Syntactic Error : misplaced semicolons or extra or missing braces
3. Static Semantic Error : type mismatch between operators and operands –
return a value to void return type also fall in this category
4. Dynamic Semantic Error : hard or impossible to detect at compile time,
runtime checks are required
5. Logical Error : can be anything from incorrect reasoning on the part of
programmer – use of assignment operator in place of comparison operator

23
Error Recovery Strategies

Panic-Mode Recovery
 Discards input symbols one at a time until one of the designated set of
synchronized token (such as semicolon or }) is found
 Skips a considerable amount of input without checking it for additional
errors
Phrase Level Recovery
 Perform local correction on the remaining input – may replace prefix of
remaining input with some string
 Typical local correction is to replace a comma by a semicolon, delete an
extraneous semicolon, or insert a missing semicolon
Error Production
 Augment (Expand) grammar with production for erroneous constructs
24
Error Recovery Strategies

Global Correction

 Choose a minimal sequence of changes to obtain a global least-cost correction

 Given an incorrect input symbol x for grammar G try to find a parse tree for a
related string y – transform x into y

 Unfortunately, these methods are in general too costly to implement in terms


of time and space, so these techniques are currently only of theoretical
interest.

25
Types of Parser

 Three general types of parser


1. Universal
• Not efficient now
2. Top-down
 Build parse tree from top (root) to bottom (leaves)
 Recursive decent (predictive parsing)
 LL (left to right, left most derivation method)
3. Bottom-up
• Starts from leaves and works their way up to root
• Operator precedence parsing
• LR (left to right, right most derivation) method
Importantly in all cases the input to the parser is scanned from left to right,
one symbol at a time 26
Top Down Parsing

 A top-down parser starts with the root of the parse tree. The root node is

labeled with the goal (start) symbol of the grammar. The top-down parsing

algorithm proceeds as follows:

1. Construct the root node of the parse tree

2. Repeat until the fringe of the parse tree matches input string

a. At a node labeled A, select a production with Aon its lhs

b. for each symbol on its rhs, construct the appropriate child

c. When a terminal symbol is added to the fringe and it does not match

the fringe, backtrack


27
Top Down Parsing

 LL Methods (Left to right . Left most derivation) and recursive decent parsing

Grammar: Leftmost derivation:


ET+T E lm T + T
T(E) lm id + T
T-E lm id + id
T  id

28
Top Down Parsing : Backtracking Problem

1. Left recursion

 can cause a top-down parser to go into infinite loop

 A grammar is said to be left recursive if it has a nonterminal A such that

there is a derivation A ⇒ A α for some  .

2. Backtracking

 It undo not only the movement but also the semantic entry in the symbol

table

3. The order the alternatives are tried

29
Ambiguity

 More than one parse tree (For some strings)

 More than one left most derivation


+
 More than one rightmost derivation

 Example : id + id * id
EE+E EE*E
 id + E E+E*E
 id + E * E  id + E * E
 id + id * E  id + id * E
 id + id * id  id + id * id

30
Eliminating Ambiguity

31
Eliminating Ambiguity

 The grammar is ambiguous as the below given string generates two different

parse trees

Two different parse trees

32
Preferred
Eliminating Ambiguity

 General idea to eliminate ambiguity

 A statement appearing between a then and an else must be matched

33
34

You might also like