Chapter 2

1
A SIMPLE SYNTAX-
DIRECTED TRANSLATOR
Chapter 2
2
Building a Simple Compiler

• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
3
Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4
Example Grammar
Context-free grammar for simple expressions:
G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>

with productions P =
list  list + digit
list  list - digit

list  digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5
Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6
Derivation for the Example

Grammar
list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
9-5+2
This is an example leftmost derivation, because we replaced

the leftmost nonterminal (underlined) in each step.
7
Derivation for the Example

Rightmost Grammar
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
list
P=  digit - list
list  digit + list  digit - digit + list
 digit - digit + digit
list  digit - list
 digit - digit + 2
list  digit  digit - 5 + 2
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 9-5+2
8
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or 
• Each interior node is labeled by a nonterminal
• If A  X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or  ( denotes the empty string)
9
Parse Tree for the Example

Grammar
Parse tree of the string 9-5+2 using grammar G
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>
Leftmost derivation Rightmost derivation
In both cases, Expr * id – num * id

• The two derivations produce different parse trees
• The parse trees imply different evaluation orders!
Derivations and Parse Trees
G
Leftmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 <id,x> Op Expr
5 <id,x> – Expr E Op E
1 <id,x> – Expr Op Expr
2 <id,x> – <num,2> Op Expr
6 <id,x> – <num,2> * Expr x – Op E
E
3 <id,x> – <num,2> * <id,y>
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>
x – 2
This evaluates as ( x – 2 ) * y
13
Ambiguity
Consider the following context-free grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

with production P =
string  string + string | string - string | 0 | 1 | … | 9

This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
14
Ambiguity (cont’d)
string string
string string string
string string string string string
9 - 5 + 2 9 - 5 + 2
15
Associativity of Operators
Left-associative operators have left-recursive productions
left  left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right  term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -
left-associative: */
16
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr  expr + term | expr – term | term
term  term * factor | term / factor | factor
factor  number | ( expr )
number  0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17
Syntax of Statements
stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts  stmt ; opt_stmts
|
18
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19
Synthesized and Inherited

Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the children of
the node
• Suppose a node N in a parse tree is labeled by the grammar
symbol X . We write X.a to denote the value of attribute a of X
at that node.
– inherited if its value at a parse-tree node is determined
by the parent (by enforcing the parent’s semantic rules)
20
Example Attribute Grammar

Syntax-directed definition for infix to postfix translation
String concat operator

Production Semantic Rule
expr  expr1 + term expr.t := expr1.t // term.t // “+”
expr  expr1 - term expr.t := expr1.t // term.t // “-”
expr  term expr.t := term.t
term  0 term.t := “0”
… …
21
Example Annotated Parse Tree
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2
22
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23
Depth-First Traversals (Example)
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2 Note: all attributes are

of the synthesized type
24
Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.
rest  + term { print(“+”) } rest

rest
Embedded
semantic action
+ term { print(“+”) } rest
25
Example Translation Scheme
expr  expr + term { print(“+”) }

expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }
26
Example Translation Scheme

(cont’d)
expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
expr - term 2
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27
2.4 Parsing
• Parsing = process of determining if a string of tokens can
be generated by a grammar.
• Input program which is a string of characters, but
tokenized into tokes.
• For any CF grammar there is a parser that takes at most
O(n3) time to parse a string of n tokens.
– 3-Level nested loop (Why?
– (Why N3)
• Top-down parsing “constructs” a parse tree from root
(start symbol) to leaves.
• Bottom-up parsing “constructs” a parse tree from leaves
to root.
28
Parser as a program
• Parser is an executable program that takes input string
• Then checks if parse tree can be constructed from this string.
• Actual construction of parse tree as data structure is not important or required
at this chapter. We will construct actual data structure tree in next chapters.
• Or software just checks if a parse tree can be constructed.
• For this purpose, Parser contains such string processing routines that;
• Check if a substring (starting from ith character to jth character is
correctly identified as Body of a Single Production.
• Such body may be recursive, so such procedures are mostly
recursive.
29
Recursive routines
• Such body may be recursive, so such procedures are
also recursive.
• Bool Exp( ){
– if base_case then return true;//if expression was correct and
completely divided into subexpressions and terms etc;
– else
– Exp( ); match(+); Term( );
– If there was error at any stage, return false else return true
• }
• If starting recursion returns successfully, it means that
program is parsed and it was correct.
30
Parser(syntax) error
• Parser generates several errors, but;
– Most common error is encountered when a substring
does not correspond to any Non-terminal of that
language.
– In other words, pattern of a substring is such that
parser cannot understand it and correspond it with
some well-defined.
– Programming construct (if-else, any loop,
declaration, compound statement, expression,
procedure call-return, array indexes, pointer
operation, statement structure, nesting etc are
common constructs of C++ language)
31
Multiple errors
• Interestingly, parser does not stop on very
first parse-error.
• C++ (and most other languages) have very
well-defined structured statements.
• Even if a statement in syntatically incorrect,
parser can find end of statement or start of
next statement.
• Parser will restart from next statement.
32
• Although such parsing may be incorrect

but;
– As first syntax error is detected, compiler will
only parse and not compiler incorrect program.
– Goal is to detect next as much errors as
possible. And then stop.
– Such error detection may also be incorrect (due
to first error) but still useful for debugging and
corrections.
33
Parser routines
• Parsing now means writing procedures which handle parsing of
input string and (logically or actually) generating parse tree. (or
verifying that such tree can be constructed).
• Normally, there are more than such subroutines, each relevant to
one non-terminal.
• Such subroutines detect some pattern of programming construct and
decide which Non-Terminal is to relevant to this pattern.
• Then they can make children nodes of that Non-terminal and attach
them with it.
• A Bottom-up parser will first detect patterns from input string and
call relevant Non-terminal subroutine.
• A Top-Down parser will a start from a starting Non-terminal
Symbol and call subroutine related to this stat symbol. Then in its body,
it will start scanning string from left and detect other Non-terminal
related symbols, then call subroutines related to those Non-terminals.
34
Top-Down parsing
• Start from production of stat symbol.
– (normally there is unique start symbol, and its
production is also unique or can be separated)
– i.e Call subroutine relevant to start symbol.
– Stmts, exp, main etc are start symbols, as they
represent whole program/expression/set of
statements.
– Such subroutine will check symbol in program
body, and match with Non-start Non-terminals
or terminals.
35
• At node N, labeled with nonterminal A,

select one of the productions for A and
construct children at N for the symbols in
the production body.
• Find the next node at which a subtree is to
be constructed, typically the leftmost
unexpanded nonterminal of the tree.
36
Predictive Parsing
• Recursive descent parsing is a top-down parsing method
– one (recursive) procedure is designed to detect Each
nonterminal. This procedure is responsible for parsing the
nonterminal’s syntactic category of input tokens
– When a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based on
input look-ahead information
• Predictive parsing is a special form of recursive descent
parsing where we use one lookahead token to
unambiguously determine the parse operations.
– Starting symbol of each statement correctly predicts the next
incoming symbols in that statement.
37
Example Predictive Parser

(Grammar)
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
38

(Execution Step 1)
Check lookahead
type()
and call match
match(‘array’)
Input: array [ num dotdot num ] of integer
lookahead
39

(Execution Step 2)
type()
match(‘array’) match(‘[’)
lookahead
40

(Execution Step 3)
type()
match(‘array’) match(‘[’) simple()
match(‘num’)
lookahead
41

(Execution Step 4)
type()
match(‘num’) match(‘dotdot’)
lookahead
42

(Execution Step 5)
type()
match(‘num’) match(‘dotdot’) match(‘num’)
lookahead
43

(Execution Step 6)
type()
match(‘array’) match(‘[’) simple() match(‘]’)
lookahead
44

(Execution Step 7)
type()
match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)
lookahead
45

(Execution Step 8)
type()
match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()
match(‘num’) match(‘dotdot’) match(‘num’) simple()
match(‘integer’)
lookahead
46
Recursive Decent Example

E → id = n | { L }
L→E;L|ε
(E stands for expression. L

stands for List as previous
topics)
{x=3;{y=4;};}
lookahead
47
Writing subroutines for above

parser
Bool E( ){//assume E is start symbol also
Tokenlookahead t=NextTokenFromLexer( );
If[t==n){
//Call routine to detect if n is correct or not
Match(n);
}
Else if (t==‘{‘){
Match (‘{‘); L();Match(‘}’);
Return true;
}
Else return false;
}
48
Bool L( ){
Tokenlookahead t=NextTokenFromLexer( );
if(t==n or t==‘{‘){
E( );Match(‘;’);L( ); return true
}
Else
Return true;
}
49
Adding a Lexical Analyzer

• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table.
– This table is pre-filled with C++ Keywords
– and library symbol names. (main, getch,
cout<<, etc)
50
Pre-Mid Assignment
(individual):
– Construct a lexical analyzer/tokenizer and
submit in my room (viva will taken).
– Those who cannot do it themselves, kindly do
not bother so submit any other person’s or
copes from internet. It is better not to submit
than being F due to cheating.
51
The Lexical Analyzer “lexer”
Lexical analyzer
y := 31 + 28*x
lexan()
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
token
(lookahead)
tokenval Parser
(token attribute) parse()

Chapter 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2

Uploaded by

Copyright:

Available Formats

1

Building a Simple Compiler

Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>

list  list + digit

list  list - digit

Derivation for the Example

This is an example leftmost derivation, because we replaced

Derivation for the Example

Parse Tree for the Example

Leftmost derivation Rightmost derivation

In both cases, Expr * id – num * id

Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

string  string + string | string - string | 0 | 1 | … | 9

string string string

string string string string string

Synthesized and Inherited

Example Attribute Grammar

String concat operator

Example Annotated Parse Tree

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

Depth-First Traversals (Example)

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

9 - 5 + 2 Note: all attributes are

rest  + term { print(“+”) } rest

Example Translation Scheme

expr  expr + term { print(“+”) }

Example Translation Scheme

• Although such parsing may be incorrect

• At node N, labeled with nonterminal A,

Example Predictive Parser

Example Predictive Parser

Input: array [ num dotdot num ] of integer

Example Predictive Parser

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple()

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple()

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple() match(‘]’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

Example Predictive Parser

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()

match(‘num’) match(‘dotdot’) match(‘num’) simple()

Recursive Decent Example

(E stands for expression. L

Writing subroutines for above

Adding a Lexical Analyzer

The Lexical Analyzer “lexer”