You are on page 1of 51

1

A SIMPLE SYNTAX-
DIRECTED TRANSLATOR

Chapter 2
2

Building a Simple Compiler


• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
3

Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4

Example Grammar

Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>


with productions P =

list  list + digit

list  list - digit


list  digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5

Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6

Derivation for the Example


Grammar

list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
9-5+2

This is an example leftmost derivation, because we replaced


the leftmost nonterminal (underlined) in each step.
7

Derivation for the Example


Rightmost Grammar
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
list
P=  digit - list
list  digit + list  digit - digit + list
 digit - digit + digit
list  digit - list
 digit - digit + 2
list  digit  digit - 5 + 2
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 9-5+2
8

Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or 
• Each interior node is labeled by a nonterminal
• If A  X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or  ( denotes the empty string)
9

Parse Tree for the Example


Grammar
Parse tree of the string 9-5+2 using grammar G

list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>

Leftmost derivation Rightmost derivation

In both cases, Expr * id – num * id


• The two derivations produce different parse trees
• The parse trees imply different evaluation orders!
Derivations and Parse Trees
G
Leftmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 <id,x> Op Expr
5 <id,x> – Expr E Op E
1 <id,x> – Expr Op Expr
2 <id,x> – <num,2> Op Expr
6 <id,x> – <num,2> * Expr x – Op E
E
3 <id,x> – <num,2> * <id,y>

This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>

x – 2
This evaluates as ( x – 2 ) * y
13

Ambiguity

Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>


with production P =

string  string + string | string - string | 0 | 1 | … | 9


This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
14

Ambiguity (cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2
15

Associativity of Operators
Left-associative operators have left-recursive productions
left  left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right  term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -

left-associative: */
16

Precedence of Operators
Operators with higher precedence “bind more tightly”
expr  expr + term | expr – term | term
term  term * factor | term / factor | factor
factor  number | ( expr )
number  0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17

Syntax of Statements

stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts  stmt ; opt_stmts
|
18

Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19

Synthesized and Inherited


Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the children of
the node
• Suppose a node N in a parse tree is labeled by the grammar
symbol X . We write X.a to denote the value of attribute a of X
at that node.
– inherited if its value at a parse-tree node is determined
by the parent (by enforcing the parent’s semantic rules)
20

Example Attribute Grammar


Syntax-directed definition for infix to postfix translation

String concat operator


Production Semantic Rule
expr  expr1 + term expr.t := expr1.t // term.t // “+”
expr  expr1 - term expr.t := expr1.t // term.t // “-”
expr  term expr.t := term.t
term  0 term.t := “0”
term  1 term.t := “1”
… …
term  9 term.t := “9”
21

Example Annotated Parse Tree

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
22

Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23

Depth-First Traversals (Example)

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2 Note: all attributes are


of the synthesized type
24

Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.

rest  + term { print(“+”) } rest


rest
Embedded
semantic action
+ term { print(“+”) } rest
25

Example Translation Scheme

expr  expr + term { print(“+”) }


expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }
26

Example Translation Scheme


(cont’d)

expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
expr - term 2
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27

2.4 Parsing
• Parsing = process of determining if a string of tokens can
be generated by a grammar.
• Input program which is a string of characters, but
tokenized into tokes.
• For any CF grammar there is a parser that takes at most
O(n3) time to parse a string of n tokens.
– 3-Level nested loop (Why?
– (Why N3)
• Top-down parsing “constructs” a parse tree from root
(start symbol) to leaves.
• Bottom-up parsing “constructs” a parse tree from leaves
to root.
28

Parser as a program
• Parser is an executable program that takes input string
• Then checks if parse tree can be constructed from this string.
• Actual construction of parse tree as data structure is not important or required
at this chapter. We will construct actual data structure tree in next chapters.
• Or software just checks if a parse tree can be constructed.
• For this purpose, Parser contains such string processing routines that;
• Check if a substring (starting from ith character to jth character is
correctly identified as Body of a Single Production.
• Such body may be recursive, so such procedures are mostly
recursive.
29

Recursive routines
• Such body may be recursive, so such procedures are
also recursive.
• Bool Exp( ){
– if base_case then return true;//if expression was correct and
completely divided into subexpressions and terms etc;
– else
– Exp( ); match(+); Term( );
– If there was error at any stage, return false else return true
• }
• If starting recursion returns successfully, it means that
program is parsed and it was correct.
30

Parser(syntax) error
• Parser generates several errors, but;
– Most common error is encountered when a substring
does not correspond to any Non-terminal of that
language.
– In other words, pattern of a substring is such that
parser cannot understand it and correspond it with
some well-defined.
– Programming construct (if-else, any loop,
declaration, compound statement, expression,
procedure call-return, array indexes, pointer
operation, statement structure, nesting etc are
common constructs of C++ language)
31

Multiple errors
• Interestingly, parser does not stop on very
first parse-error.
• C++ (and most other languages) have very
well-defined structured statements.
• Even if a statement in syntatically incorrect,
parser can find end of statement or start of
next statement.
• Parser will restart from next statement.
32

• Although such parsing may be incorrect


but;
– As first syntax error is detected, compiler will
only parse and not compiler incorrect program.
– Goal is to detect next as much errors as
possible. And then stop.
– Such error detection may also be incorrect (due
to first error) but still useful for debugging and
corrections.
33

Parser routines
• Parsing now means writing procedures which handle parsing of
input string and (logically or actually) generating parse tree. (or
verifying that such tree can be constructed).
• Normally, there are more than such subroutines, each relevant to
one non-terminal.
• Such subroutines detect some pattern of programming construct and
decide which Non-Terminal is to relevant to this pattern.
• Then they can make children nodes of that Non-terminal and attach
them with it.
• A Bottom-up parser will first detect patterns from input string and
call relevant Non-terminal subroutine.
• A Top-Down parser will a start from a starting Non-terminal
Symbol and call subroutine related to this stat symbol. Then in its body,
it will start scanning string from left and detect other Non-terminal
related symbols, then call subroutines related to those Non-terminals.
34

Top-Down parsing
• Start from production of stat symbol.
– (normally there is unique start symbol, and its
production is also unique or can be separated)
– i.e Call subroutine relevant to start symbol.
– Stmts, exp, main etc are start symbols, as they
represent whole program/expression/set of
statements.
– Such subroutine will check symbol in program
body, and match with Non-start Non-terminals
or terminals.
35

• At node N, labeled with nonterminal A,


select one of the productions for A and
construct children at N for the symbols in
the production body.
• Find the next node at which a subtree is to
be constructed, typically the leftmost
unexpanded nonterminal of the tree.
36

Predictive Parsing
• Recursive descent parsing is a top-down parsing method
– one (recursive) procedure is designed to detect Each
nonterminal. This procedure is responsible for parsing the
nonterminal’s syntactic category of input tokens
– When a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based on
input look-ahead information
• Predictive parsing is a special form of recursive descent
parsing where we use one lookahead token to
unambiguously determine the parse operations.
– Starting symbol of each statement correctly predicts the next
incoming symbols in that statement.
37

Example Predictive Parser


(Grammar)

type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
38

Example Predictive Parser


(Execution Step 1)

Check lookahead
type()
and call match

match(‘array’)

Input: array [ num dotdot num ] of integer

lookahead
39

Example Predictive Parser


(Execution Step 2)
type()

match(‘array’) match(‘[’)

Input: array [ num dotdot num ] of integer

lookahead
40

Example Predictive Parser


(Execution Step 3)
type()

match(‘array’) match(‘[’) simple()

match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
41

Example Predictive Parser


(Execution Step 4)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’)

Input: array [ num dotdot num ] of integer

lookahead
42

Example Predictive Parser


(Execution Step 5)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
43

Example Predictive Parser


(Execution Step 6)
type()

match(‘array’) match(‘[’) simple() match(‘]’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
44

Example Predictive Parser


(Execution Step 7)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
45

Example Predictive Parser


(Execution Step 8)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()

match(‘num’) match(‘dotdot’) match(‘num’) simple()

match(‘integer’)
Input: array [ num dotdot num ] of integer

lookahead
46

Recursive Decent Example


E → id = n | { L }
L→E;L|ε

(E stands for expression. L


stands for List as previous
topics)

{x=3;{y=4;};}

lookahead
47

Writing subroutines for above


parser
Bool E( ){//assume E is start symbol also
Tokenlookahead t=NextTokenFromLexer( );
If[t==n){
//Call routine to detect if n is correct or not
Match(n);
}
Else if (t==‘{‘){
Match (‘{‘); L();Match(‘}’);
Return true;
}
Else return false;
}
48

Bool L( ){
Tokenlookahead t=NextTokenFromLexer( );
if(t==n or t==‘{‘){
E( );Match(‘;’);L( ); return true
}
Else
Return true;
}
49

Adding a Lexical Analyzer


• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table.
– This table is pre-filled with C++ Keywords
– and library symbol names. (main, getch,
cout<<, etc)
50

Pre-Mid Assignment
(individual):
– Construct a lexical analyzer/tokenizer and
submit in my room (viva will taken).
– Those who cannot do it themselves, kindly do
not bother so submit any other person’s or
copes from internet. It is better not to submit
than being F due to cheating.
51

The Lexical Analyzer “lexer”

Lexical analyzer
y := 31 + 28*x
lexan()

<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>

token
(lookahead)
tokenval Parser
(token attribute) parse()

You might also like