Unit-III (Part-A) Intermediate Code - 1

Compiler Design (KCS-502)
3rd year (Semester – V)

Session – 2023 - 24
Unit – III
Part - A
Ratish Srivastava
Asst. Prof.
CSE Dept.
UCER, Prayagraj
Compiler Design, KCS-502 1
Intermediate Code
• In many compilers the source code is translated
into a language which is intermediate in
complexity between a high-level programming
language and machine code. Such a language is
therefore called ‘intermediate code’ or
‘intermediate text’.
• It is possible to translate directly from source to

machine or assembly language in a syntax-
directed way but doing so makes generation of
optimal code a difficult task.
Intermediate Code
• The reason, efficient machine or assembly language is
hard to generate, is that one is immediately forced to
choose a particular register to hold the result of each
computation.
• Therefore one usually chooses for intermediate text a
notation in which each statement involves at most one
arithmetic operation or one test.
• The usual intermediate text introduces symbols to
stand for various temporary quantities such as the
value of B*C in the source language expression A+B*C.

Intermediate Code
• Types of intermediate code often used in
compilers are:
– Postfix notation
– Syntax trees
– Three address code
(representations for implementing Three Address Code are-)
oQuadruples
oTriples
oIndirect triples
Postfix Notation
• In general, if e1 and e2 are any postfix expressions and
θ is any binary operator, the result of applying θ to the
values denoted by e1 and e2 is indicated in postfix
notation by e1e2θ.
• No parentheses are needed in postfix notation because
the position and arity (number of arguments) of the
operators permits only one way to decode a postfix
expression.
• If k-ary opeartor θ is applied to postfix expressions
e1, e2, ….., ek, the result is denoted by e1e2….ekθ.

Postfix Notation
• Let us introduce a useful 3-ary (ternary) operator,
the conditional expression.
• If e then x else y denote the expression whose
value is x if eǂ0 and y if e=0.
• Using ? as a ternary postfix operator we can
represent this expression as exy?
• The postfix form of the expression:
If a then if c-d then a+c else a*c else a+b is
acd-ac+ac*?ab+?

Postfix Notation
• Note:
– One language that normally uses a postfix
intermediate language is SNOBOL. In fact, SNOBOL
is often interpreted rather than compiled. The
output of the SNOBOL compiler is the intermediate
code itself which is passed to an interpreter, which
reads the intermediate code and executes it.

Postfix Notation
• Evaluation of Postfix Expressions:
– We can evaluate the postfix expression easily using a stack,
either a hardware stack or one implemented in software.
– The general strategy is to scan the postfix code left to right.
– We push each operand onto the stack.
– If we encounter a k-ary operator, its first (left most)
argument will be k-1 positions below the top on the stack.
– It is then easy to apply the operator to the top k values on
the stack.
– These values are popped and the result of applying the
k-ary operator is pushed onto the stack.
Postfix Notation
• Control Flow in Postfix Code:
– While postfix notation is useful for intermediate code if the
language is mostly expressions, but if operands are
undefined or have side effects the postfix implementation
not only would be inefficient, but might be incorrect.
– One solution is to introduce labels and conditional and
unconditional jumps into the postfix code.
• The postfix code, then be stored in a one dimensional array, with
each word of the array being either an operator or operand
• Operands are represented by pointers to the symbol table and
operators by integer codes.
• To distinguish operators from operands we use negative integers
for operator codes.
• In this implementation, a label is just an index into the array
holding the code.
Postfix Notation
• Syntax-Directed Translation to Postfix Code:
– The production of postfix intermediate code for expressions is simple.
– It is described by the syntax-directed translation scheme as follows:
Production Semantic Action
E → E(1) op E(2) E.CODE := E(1).CODE || E(2).CODE || ‘op’
E → (E(1)) E.CODE := E(1).CODE
E → id E.CODE := id
• Here E.CODE is a string valued translation.
• The value of the translation E.CODE for the first production is the
concatenation of the two translations E(1).CODE and E(2).CODE and the
symbol ‘op’ which stands for any operator symbol.
• In the second rule, the translation of a parenthesized expression is the
same as that for the unparenthesized expression.
• The third rule tells us that the translation of any identifier is the identifier
itself.

Postfix Notation
– The semantic actions in this translation scheme have a
particularly simple form.
– The translation of the non-terminal on the left of each
production is the concatenation of the translations of
the non-terminals on the right in the same order as in
the production, followed by some additional string.
– Such a translation scheme is called ‘Simple Postfix’
and it can be implemented without a translation stack
just by emitting the output string after each
reduction.

Parse Tree and Syntax Trees
• The parse tree is a useful intermediate language
representation for a source program especially in
optimizing compilers where the intermediate code
needs to be extensively restructured.
• A parse tree, however, often contains redundant
information which can be eliminated, thus producing a
more economical representation of the source
program.
• One such variant of a parse tree is what is called an
(abstract) ‘syntax tree’, a tree in which each leaf
represents an operand and each interior node an
operator.
• For example, the syntax tree for the expression
a*(b+c)/d is
and the syntax tree for statement

if a=b then
a:=c+d
else
b:=c-d is
• So, the syntax tree is nothing more than a condensed form of the
parse tree.
• The operator and keyword nodes of the parse tree are moved to
their parent and a chain of single productions is replaced by single
link.
Parse tree for the string id+id*id Syntax tree for id+id*id
Example:
Construct syntax tree and postfix notation for
the following expression:
(a+(b*c))↑d

Solution:
Postfix notation
(a+(b*c))↑d
(a+X)↑d Put X=b*c so bc*
Y↑d Put Y=a+X so aX+
Z Put Z= Y↑d so Yd↑
Now backward substituting the values of temporary
variables, we get,
Z = Yd↑
= aX+d↑
= abc*+d↑
So, the postfix expression is abc*+d↑
• Syntax-Directed Construction of Syntax trees:
– Like postfix code, it is easy to define either a parse tree or a
syntax tree in terms of a syntax-directed translation scheme.
– Syntax-directed translation scheme to construct syntax trees are
as follows:
Production Semantic Action
E → E(1) op E(2) {E.VAL := NODE(op, E(1).VAL, E(2).VAL}
E → (E(1)) {E.VAL := E(1).VAL}
E → -E(1) {E.VAL := UNARY(-, E(1).VAL)}
E → id {E.VAL := LEAF(id)}
• E.VAL is a translation whose value is a pointer to a node in the syntax
tree.

• The function NODE(OP, LEFT, RIGHT) takes 3 arguments; the
first is the name of the operator, the second and third are
pointer to roots of sub trees.
– The function creates a new node labelled by the first argument
and makes the second and third arguments the left and right
children of the new node, returning a pointer to the created
node.
• The function UNARY(OP, CHILD) creates a new node labelled

OP and makes CHILD its child and a pointer to the created
node is returned.
• The function LEAF(ID) creates a new node labelled by ID and
returns a pointer to the node. This node receives no
children.

Three-Address Code,
Quadruples and Triples
• The final category of intermediate code is known as ‘three-
address code’.
• This intermediate code is preferred in many compilers,
especially those doing extensive code optimization because
it allows the intermediate code to be rearranged in a
convenient manner.
• Three-Address Code:
– Three-address code is a sequence of statements of the general
form
x:=y op z
where x, y and z are names, constants or compiler-generated
temporary names, op stands for any operator such as fixed or
floating point arithmetic operator or logical operator on boolean
value data.
Three-Address Code,
– There is only one operator on the right side of
statement.
– Thus a source language expression like x+y*z might be
translated into a sequence
t1:=y*z
t2:=x+t1
where t1 and t2 are compiler-generated temporary
names.
– The reason for the term “three-address code” is that
each statement usually contains 3 addresses, two for
the operands and one for the result.

Three-Address Code,
• Types of Three-Address Statements:
1) Assignment statements of the form x := y op z where ‘op’
is a binary arithmetic or logical operation.
2) Assignment instructions of the form x := op y, where op
is a unary operation. Essential unary operations include
unary minus, logical negation, shift operators and
conversion operators that convert a fixed-point number
to a floating-point number.
3) Copy statements of the form x := y where the value of y
is assigned to x.
4) The unconditional jump goto L. This means the three-
address statement with label L is the next to be
executed.

Three-Address Code,
5) Conditional jumps such as if x relop y goto L. This
instruction applies a relational operator (<, =, >=
etc.) to x and y, and executes the statement with
label L next if x stands in relation relop to y. If not,
the three-address statement following if A relop B
goto L is executed next as in the usual sequence.
Note: relop means relational operator
6) Indexed assignments of the form x := y[i] and
x[i] := y.
7) Address and pointer assignments of the form
x := &y, x := *y and *x := y.

Three-Address Code,
8) param x and call p, n for procedure calls and return y,
where y representing a returned value is optional. Their
typical use is as the sequence of three-address
statements:
param x1
param x2
…………….
param xn
call p, n
generated as part of call of the procedure p(x1, x2, ….. , xn).
The integer ‘n’ indicating the number of actual parameters
in “call p, n” is not redundant because calls can be nested.

Three-Address Code,
• Implementations of Three-Address
Statements:
– A three-address statement is an abstract form of
intermediate code.
– In a compiler, these statements can be
implemented as records with fields for the
operator and the operands.
– Three such representations are quadruples,
triples and indirect triples.

Three-Address Code,
• Quadruples:
– A quadruple is a record structure with 4 fields, which
we call op, arg1, arg2 and result.
– The op field contains an internal code for the
operator.
– The three-address statement x := y op z is represented
by placing y in arg1, z in arg2 and x in result.
– Statement with unary operators like x := -y or x := y do
not use arg2.
– Operators like param use neither arg2 nor result.

Three-Address Code,
– Conditional and unconditional jumps put the
target label in result.
– For example, an assignment statement like
A := -B * (C + D) would be translated to three-
address statements, like as
T1 := -B
T2 := C + D
T3 := T1 * T2
A := T3

Three-Address Code,
– These statements are represented by quadruples as:
Location op arg1 arg2 Result
(0) uminus B - T1
(1) + C D T2
(2) * T1 T2 T3
(3) := T3 - A
– The contents of field arg1, arg2, and result are

normally pointers to the symbol-table entries for the
names represented by these fields.
– If so, temporary names must be entered into the
symbol table as they are created.

Three-Address Code,
• Triples:
– To avoid entering temporary names into the symbol table,
we might refer to temporary value by the position of the
statement it computes.
– If we do so, three-address statements can be represented
by records with only three fields: op, arg1 and arg2.
– The fields arg1 and arg2, for the arguments of op, are
either pointers to the symbol table (for programmer
defined names or constants) or pointers into the triple
structure (for temporary values).
– Since three fields are used, this intermediate code format
is known as ‘triples’.

Three-Address Code,
Triple representation of three-address statement

Location op arg1 arg2
(0) uminus B -
(1) + C D
(2) * (0) (1)
(3) := A (2)

Three-Address Code,
– A ternary operation like A[I] := B, actually requires two
entries in the triple structure
(0) [ ]= A I
(1) - B -
while A := B[I] is naturally represented as

(0) =[ ] B I
(1) := (0) A

Three-Address Code,
• Indirect Triples:
– Another implementation of three-address code
which has been considered is that of listing
pointers to triples, rather than listing the triples
themselves.
– This implementation is naturally called “Indirect

Triples”.

Three-Address Code,
Example 1:
Let us use an array STATEMENT to list pointers to
triples in desired order.
Then the three-address statements of
T1 := -B
T2 := C + D
T3 := T1 * T2
A := T3
might be represented as
Three-Address Code,
Location Statement Location op arg1 arg2
(0) (14) (14) uminus B -
(1) (15) (15) + C D
(2) (16) (16) * (14) (15)
(3) (17) (17) := A (16)
So, in the indirect triple representation the

listing of triples is been done and listing
pointers are used instead of using statements.

Three-Address Code,
Example 2:
Translate the following expression to
quadruple, triple and indirect triple:
(x + y) * (y + z) + (x + y + z)

Three-Address Code,
Solution:
Given expression is (x + y) * (y + z) + (x + y + z)
The three address code is
t1 := x + y
t2 := y + z
t3 := t1 * t2
t4 := t1 + z
t5 := t3 + t4

Three-Address Code,
Quadruple:
(0) + x y t1
(1) + y z t2
(2) * t1 t2 t3
(3) + t1 z t4
(4) + t3 t4 t5

Three-Address Code,
Triple:
(0) + x y
(1) + y z
(2) * (0) (1)
(3) + (0) z
(4) + (2) (3)

Three-Address Code,
Indirect Triple:
Location Statement Location op arg1 arg2
(0) (11) (11) + x y
(1) (12) (12) + y z
(2) (13) (13) * (11) (12)
(3) (14) (14) + (11) z
(4) (15) (15) + (13) (14)

Three-Address Code,
Example 3:
Translate the following expression into
quadruple, triple and indirect triple:
-(a + b) * (c + d) – (a + b + c)

Three-Address Code,
Solution:
Given expression is -(a + b) * (c + d) – (a + b + c)
The three-address code is
t1 := a + b
t2 := -t1
t3 := c + d
t4 := t2 * t3
t5 := t1 + c
t6 := t4 - t5

Three-Address Code,
Quadruple:
(0) + a b t1
(1) uminus t1 t2
(2) + c d t3
(3) * t2 t3 t4
(4) + t1 c t5
(5) - t4 t5 t6

Three-Address Code,
Triple:
(0) + a b
(1) uminus (0)
(2) + c d
(3) * (1) (2)
(4) + (0) c
(5) - (3) (4)

Three-Address Code,
Indirect Triple:
Location Statement Location Op Arg1 arg2
(0) (10) (10) + a b
(1) (11) (11) uminus (10)
(2) (12) (12) + c d
(3) (13) (13) * (11) (12)
(4) (14) (14) + (10) c
(5) (15) (15) - (13) (14)

Three-Address Code,
Example 4:
Give the sequence of three-address code
instructions corresponding to each if the
arithmetic expressions: x = 2 + 3 + 4 + 5

Three-Address Code,
Solution:
The three address code for the above
sequence is as follows:
t1 := 2
t2 := t1 + 3
t1 := t2 + 4
t4 := t3 + 5
x := t4

Comparison of Representation:
The Use of Indirection
• The difference between triples and quadruples
may be regarded as a matter of how much
indirection is present in the representation.
– When we ultimately produce target code, each name,
temporary or programmer defined, will be assigned
some run-time memory location.
– This location will be placed in the symbol-table entry
for the datum. Using the quadruple notation, a three
address statement defining or using a temporary can
immediately access the location for that temporary
via the symbol table.
• A more important benefit of quadruples appears
in an optimizing compiler, where statements are
often moved around.
– Using the quadruple, if we move a statement
computing x, the statements using x requires no
change.
– However, in the triples notation, moving a statement
that defines a temporary value requires us to change
all references to that statement in the arg1 and arg2
arrays. This problem makes triples difficult to use in an
optimizing compilers.

• A statement can be moved by recording the
statement list. Since pointers to temporary values
refer to the op-arg1-arg2 arrays, which are not
changed, none of those pointers need to be
changed.
– Thus, indirect triples look very much like quadruples
as far as their utility is concerned.
– Indirect triples can save some space compared with
quadruples, if some temporary value is used more
than once. The reason is that two or more entries in
the statement array can point to the same line of the
op-arg1-arg2 structure.

• Both triples and quadruples waste some
space, since fields will occasionally be empty.
If space is important, one can use a single
array and store either triple or quadruples
consecutively.
– The advantage of this representation is seen if we
try to examine the statements in reverse order.

Unit-III (Part-A) Intermediate Code - 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-III (Part-A) Intermediate Code - 1

Uploaded by

Copyright:

Available Formats

Compiler Design (KCS-502)

3rd year (Semester – V)

• It is possible to translate directly from source to

Compiler Design, KCS-502 3

Compiler Design, KCS-502 5

Compiler Design, KCS-502 6

Compiler Design, KCS-502 7

Compiler Design, KCS-502 10

Compiler Design, KCS-502 11

and the syntax tree for statement

Compiler Design, KCS-502 15

Compiler Design, KCS-502 17

• The function UNARY(OP, CHILD) creates a new node labelled

Compiler Design, KCS-502 18

Compiler Design, KCS-502 20

Compiler Design, KCS-502 21

Compiler Design, KCS-502 22

Compiler Design, KCS-502 23

Compiler Design, KCS-502 24

Compiler Design, KCS-502 25

Compiler Design, KCS-502 26

– The contents of field arg1, arg2, and result are

Compiler Design, KCS-502 27

Compiler Design, KCS-502 28

Triple representation of three-address statement

Compiler Design, KCS-502 29

while A := B[I] is naturally represented as

Compiler Design, KCS-502 30

– This implementation is naturally called “Indirect

Compiler Design, KCS-502 31

So, in the indirect triple representation the

Compiler Design, KCS-502 33

Compiler Design, KCS-502 34

Compiler Design, KCS-502 35

Compiler Design, KCS-502 36

Compiler Design, KCS-502 37

Compiler Design, KCS-502 38

Compiler Design, KCS-502 39

Compiler Design, KCS-502 40

Compiler Design, KCS-502 41

Compiler Design, KCS-502 42

Compiler Design, KCS-502 43

Compiler Design, KCS-502 44

Compiler Design, KCS-502 45

Compiler Design, KCS-502 47

Compiler Design, KCS-502 48

Compiler Design, KCS-502 49

You might also like