You are on page 1of 15

Construction of Syntax Trees:

The use of syntax trees as an intermediate representation allows


translation to be decoupled from parsing.
Translation routines that are invoked during parsing must live with two
kinds of restrictions.

1. A grammar that is suitable for parsing may not reflect the natural
hierarchical structure of the constructs in the language.

Example: a grammar for Fortran may view a subroutine as consisting


simply of a list of statements. However, analysis of the subroutines may
be easier if we use a tree representation that reflects the nesting of DO
loops

2. The parsing methods constrains the order in which nodes in a parse


tree are considered. This order may not match the order in which
information about a construct becomes available. For this reason,
compilers for C usually construct syntax trees for declaration.
Syntax trees:
The production S  if B then S1 else S2 might appear in a syntax tree as

if - then – else

B S1 S2

In a syntax tree, operators and keywords do not appear as leaves, but rather
are associated with the interior node that would be the parent of those leaves
in the parse tree.
Also the parse tree for the expression 3 * 5 + 4 shown below

* 4

3 5
Constructing Syntax Trees for expressions:
Each node in a syntax tree can be implemented as a record
with several fields.

In the node for an operator, one field identifies the operator


and the remaining fields contain pointers to the nodes for the
operands.

When used for translation, the nodes in in a syntax tree may


have additional fields to holds the values ( or pointers to
value) of attributes attached to the node. We use some
functions to create nodes of syntax tree for expressions with
binary operator.
Each function returns a pointer to the newly created node.
1. mknode (op, left, right) creates an operator node with
label op and two fields containing pointers to left and
right.

2. mkleaf(id, entry) creates identifier node with label id and a


field containing entry, a pointer to the symbol table entry
for the identifier.

3. mkleaf( num, value) creates a numbr of nodes with label


num and a field containing val, the value of the number.
Example: the following sequence of functions calls creates the syntax tree for
the expression a – 4 + c
P1, p2, p3, p4, p5 are pointers to nodes, and entrya, entryc are pointers to
the symbol-table entries for identifiers a and c.

p1 := mkleaf (id, entrya);


p2 := mkleaf (num, 4);
p3 := mknode ( ‘-‘, p1,p2);
p4:= mkleaf (id, entryc);
p5:= mknode (‘+’, p3,p4);+

- id

To entry for c
id num 4

To entry for a
The tree constructed bottom – up.

The function calls

mkleaf (id, entrya) and


mkleaf (num, 4)

construct the leaves for a and 4;

The pointers to these nodes are saved using p1 and p2.

The call mknode (‘ – ‘, p1, p2) then constructs the interior


node with the leaves for a and 4 as children.

After two more steps, p5 is left pointing to the root.


Intermediate Code Generation:

Static Intermediate Intermediate Code


Parser
Checker Code generator Generator
Code

Intermediate Languages:

Syntax trees and Postfix notation, that we discussed before, are two kind of
intermediate representations; A third one, called three-address code.
The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing syntax
trees or for generating postfix notation.
Graphical Representations:

A syntax tree depicts the natural hierarchical structure of a source program.


A dag gives the same information but in a more compact way because common
subexpressions are identified. A syntax tree and dag for the assignment statement
a := b * -c + b * -c shown below:

assign assign

a + a +

* * *

b uminus b uminus b minus

c c c
Three – Address Code

The three address code is a sequence of statements of the general form:

X := Y op Z

Where X, Y, Z are names; op stands for any operator.

So a source language expression like X + Y * Z might be


translated into a sequence:

t1 := Y * Z
t2 := X + t1

Where t1 and t2 are compiler-generated temporary names.

Three address code is a linearized representation of a syntax tree or a dag in


which explicit names correspond to the interior nodes of the graph.
The syntax tree and dag above are represented by the following three address
code sequences:

t1 := -c t1 := -c
t2 := b * t1 t2 := b * t1
t3 := -c t3 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5

1. Code for the syntax tree 2. Code for the dag

The reason for the term “three-address code” is that each statement usually
contains three addresses, two for the operands and one for the result.

In the implementations of three-address code, a programmer-defined name is


replaced by a pointer to a symbol-table entry for that name.
Types of three-address statements:
Assignment statements of the form X := Y op Z, where op is a binary arithmetic
or logical operation.

Assignment instructions of the form X := op Y, where op is a unary operation.

Copy statements of the form X := Y where the value of Y is assigned to X.

The unconditional jump goto L. The three-address statement with label L is the next
to be executed.

Conditional jumps such as if X relop Y goto L.

Param X and call p, n for procedure calls and return y, where y representing a
returned value is optional. Their typical use is as the sequence of three-address
statements:
Param x1
Param x2
Param x3
:
Param xn
Call p, n
Indexed assignments of the form X := Y[i] and X[i] := Y
Address and pointer assignments of the form X := &Y, X := *Y, and *X := Y
Implementation of Three-Address Statements:
A three-address statement is an abstract form of
intermediate code.

In a compiler, these statements can be


implemented as records with fields for the operator
and the operands.

There are three representations – Quadruples,


triples, and indirect triples:
Quadruples:
Is a record structure with four fields, op, arg1, arg2, and the result.
The op field contains an internal code for the operator.
Statements like X := Y op Z Y in arg1, Z in arg2, and X in result.
Statements like X := - Y or X := Y do not use arg2.
Operators like param use neither arg2 nor result.
Conditional & unconditional jumbs put the target label in result.
Below is the quadruples for the assignment a := b * -c + b * -c (using the above
three-address code):
op arg1 arg2 result

(0) uminus c t1

(1) * b t1 t2

(2) uminus c t3

(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a

The contents of fields arg1, arg2, and result are normally pointers to the symbol-
table entries for the names represented by these field.
Triples:

op arg1 arg2

(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)

To avoid entering temporary names into the symbol table, we might refer to a
temporary value by the position of the statement that computes it.

Three-address statements can be represented by records with only three fields:


op, arg1, and arg2.

The field arg1 and arg2, for the arguments of op, are either pointers to the
symbol table or pointers into the triple structure.

Since three fields are used, this intermediate code format is known as triples.

Parenthesized numbers represent pointers into the triple structure, while the
names themselves represent pointers to the symbol-table.
Indirect Triples:
Is a listing pointers to triples, rather than listing the triples themselves.

Statement
(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)

op arg1 arg2
(14) uminus c
(15) * b (14)
(16) uminus c
(17) * b (16)
(18) + (15) (17)
(19) assign a (18)

You might also like