1 Chapter - 5: Intermediate Code Generation Bahir Dar Institute of Technology

Chapter 5
Intermediate code generation
Chapter – 5 : Intermediate Code Generation 1 Bahir Dar Institute of Technology

Introduction
Phases of compiler
Introduction to intermediate code generation
• Intermediate code is the interface between front end and back end in a
compiler
• It receives input from its predecessor phase semantic analyzer, in the
form of an annotated syntax tree.
• Translates the annotated abstract-syntax tree to intermediate code
• Ideally the details of source language are confined to the front end and
the details of target machines to the back end
▪ Means that, m * n compliers can be built by writing m front ends and n
back ends – save considerable amount of effort
▪ In a compiler,
• the front end translates source program into an
intermediate representation,
• and the back end generates the target code from this
intermediate representation.

Introduction to intermediate code generation
• Although a compiler can directly produce a target language
(i.e. machine code or assembly of the target machine),
producing a machine independent intermediate representation
has the following benefits.
• Retargeting to another machine is facilitated.
▪ Intermediate code representation is neutral in relation to
target machine, so the same intermediate code generator can
be shared for all target languages (machines).
▪ Build a compiler for a new machine by attaching a new
code generator to an existing front-end
• Machine independent code optimization can be applied to
intermediate code.
• See the next two slides for more elaboration about benefits of IR
Why IR?
Portability - Suppose We have n-source languages and m-Target
languages. Without Intermediate code we will change each source
language into target language directly. So, for each source-target
pair we will need a compiler. Hence we will require (n*m)
Compilers, one for each pair. If we Use Intermediate code We will
require n-Compilers to convert each source language into
Intermediate code and m-Compilers to convert Intermediate code
into m-target languages. Thus We require only (n+m) Compilers.
C SPARC
Pascal HP PA
FORTRAN x86
C++ IBM PPC

Why IR?
C SPARC
Pascal HP PA
IR
FORTRAN x86
C++ IBM PPC
▪ Retargeting - Build a compiler for a new machine by attaching a new

code generator to an existing front-end.
▪ Optimization - reuse intermediate code optimizers in compilers for
different languages and different machines.
▪ Program understanding - Intermediate code is simple enough to be
easily converted to any target code but complex enough to represent all
the complex structure of high level language.
Intermediate Languages/code Types
• An intermediate language is an abstract programming
language used by a compiler as an in-between step
when translating a computer program into machine
code.
• Before compiling the program into code for an actual,
physical machine, the compiler first translates it into
intermediate code suitable for a theoretical, abstract
machine.
• This code is analyzed by the compiler, and if any
opportunities for optimization are identified the
compiler can perform the optimizations when making
the translation into assembly language.

Intermediate Languages/code Types
• Intermediate language can be many different languages, and the
designer of the compiler decides this intermediate language.
• Graphical IRs:
– Abstract Syntax trees
– Directed Acyclic Graphs (DAGs)
– Control Flow Graphs
• Linear IRs:
– postfix(suffix or polish) notation
– Three address code (quadruples)
– quadruples are close to machine instructions, but they are not actual machine instructions.
• some programming languages have well defined

intermediate languages:
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute instructions in
these intermediate languages.
Graphical IRs
• Abstract Syntax Trees (AST) – retain essential structure of
the parse tree, eliminating unneeded nodes.
• Directed Acyclic Graphs (DAG) – gives same information but

in a more compacted AST to avoid duplication – smaller
footprint as well
•because common subexpressions are identified.
• Control flow graphs (CFG) – explicitly model control flow

• translation of statements like if-else and while-statements.
• In programming languages, Boolean expressions are used to:
• Alter the flow of control. (used as conditional expressions in
statements that alter the flow of control)
• Compute logical values (represent true or false values.
• And can be evaluated in analogy to arithmetic expressions using
three-address instructions with logical operators)

Graphical IRs: Generating DAG
• Check whether an operand is already present
▫ if not, create a leaf for it
• Check whether there is a parent of the operand that represents
the same operation
▫ if not create one, then label the node representing the result
with the name of the destination variable, and remove that
label from all other nodes in the DAG
:= string a := b *-c + b*-c
a + :=
* * a +
b - (unary) b - (unary) *
b - (unary)
c c
AST c
DAG
Constructing DAG/AST using Value Number Method
• Nodes of a syntax tree or DAG are stored in an array of records.
• Each row of the array represents one record, and therefore one node.
• In each record, the first field is an operation code, indicating the label
of the node.
• Leaves have one additional field, which holds the lexical value (either a
symbol-table pointer or a constant, in this case), and
• interior nodes have two additional fields indicating the left and right
children
• Egg. Representation of statement: i =i+10
index of the record for that

node within the array and
called the value number
Egg. node + has value number

3, and its left and right
children have value numbers
1 and 2, respectively. Nodes of a DAG for i = i + 10
allocated in an array

Constructing DAG/AST using Value Number Method
• Egg.2: a= b* -c + b * -c

Graphical IRs: control flow graphs
▪ Nodes in the control flow graph are basic blocks
• A basic block is a sequence of statements always entered
at the beginning of the block and exited at the end
▪ Edges in the control flow graph represent the control flow
Egg: B0
if (x < y) if (x < y) goto B1 else goto B2
x = 5*y + 5*y/3;
else B1 B2
y = 5; x = 5*y + 5*y/3 y = 5
x = x+y;
B3 x = x+y
• Each block has a sequence of statements

• No jump from or to the middle of the block
• Once a block starts executing, it will execute till the end
Linear IRs: Postfix notation (PN)
• Postfix notation is a linearized representation of a syntax
tree;
• it is a list of the nodes of the tree in which a node appears
immediately after its children
• In postfix notation the operands occurs first and then
operators are arranged.
◼ Form Rules:
◼ If E is a variable/constant, the PN of E is E itself.
◼ If E is an expression of the form E1 op E2, the PN of E is E’1
E’2 op (E’1 and E’2 are the PN of E1 and E2, respectively.)
◼ If E is a parenthesized expression of form (E1), the PN of E
is the same as the PN of E1.
Ex: (A + B) * (C + D), then
PN: A B + C D + *
a* (b+c), then
PN: abc+* How about (a+b)/(c-d)
Linear IRs: Three-Address Code
• A three-address code is a linearized representation of a syntax
tree or a DAG in which explicit names correspond to the interior
nodes of the graph.
• Has the form: x := y op z where x, y and z are names,
constants or compiler- generated temporaries; op is any operator.
• For example expression x+y*z can be translated into the
sequence of three-address instructions:
t1 =y*z,
t2= x+t1
• But we may also the following notation for three-
address code (it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each
statement usually contains three addresses (two for
operands, one for the result).
Three address Representation of DAG/AST
• Source Code1: a = b * -c + b * -c
• Three address code:
Note that the statements: minus c appears two

t1= minus c and a = t5 have times b/c this code is for
only two addresses. abstract syntax tree
• Tree and DAG Representation

Three address Representation of DAG/AST
• Source Code2: a + a * (b – c) + d * ( b - c)
b - c appears
once b/c this
code is for DAG
• DAG Representation
Three address code representation

Types of Three-Address Statements
1. Binary Operator: op y,z,result or
result := y op z
where op is a binary arithmetic or logical operator.
This binary operator is applied to y and z, and the
result of the operation is stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c
2. Unary Operator: op y, result or
result := op y
where op is a unary arithmetic or logical operator.
This unary operator is applied to y, and the result of
the operation is stored in result.
Ex: uminus a,c
Types of Three-Address Instruction
3. Assignment Type 1: x := y op z
op is a binary arithmetic or logical operation
x, y and z are addresses
4. Assignment Type 2: x := op z
op is a unary arithmetic or logical operation
x and z are addresses
5. Copy Instruction: x:= y
x and y are addresses and x is assigned the value of y
6. Unconditional Jump: goto L

We will jump to the three-address code with the label L, and
the execution continues from that statement.
Ex: goto L1 // jump to L1
jmp 7 // jump to the statement 7
Types of Three-Address Statements (cont.)
8. Procedure Parameters: param x
Procedure Calls: call p,n
where x is an actual parameter, we invoke the procedure
p with n parameters.

Types of Three-Address Statements (cont.)
9. Indexed Assignments:
x := y[i]
sets x to the value in location i memory units beyond locationy
y[i] := x
sets contents of the location i memory units beyond location y to
the value of x
10. Address and Pointer Assignments:
x := &y
sets the r-value of x to l-value of y
x := *y where y is a pointer whose r-value is a location
sets the r-value of x equal to the contents of that location
*x := y
sets the r-value of the object pointed by x to the r-value of y

Representing three-Address Statements
• A three-address statement is an abstract form of

intermediate code.
• Has three representations:

• quadruples,
• triples, and
• indirect triples

Quadruples
▪ The quadruple is a structure with at the most four fields such
as op, arg1, arg2 and result.
▪ The op field is used to represent the internal code for
operator.
▪ The arg1 and arg2 represent the two operands.
▪ And result field is used to store the result of an expression.
• Example-1: The three-address instruction a:= x + y * z
y * z
x + t0

Quadruples
• Store each fields directly
• A benefit of quadruples over triples can be seen in an optimizing
compiler, where instructions are often moved around.
• t0= y*z
• t0 = x + t0
• a = t1
Using array Using linked list
* y z t0
op arg1 arg2 result
* y z t0 + x t0 t1
+ x t0 t1
= t1 a
= t1 a
Less
Easy to
space
Re-order

Quadruples
• Example-2: Three-address code for the assignment a = b * - c +b * - c ;
• Special operator minus is used to distinguish the unary minus operator (- c), from
binary minus operator (b – c)
NB: unary-minus "three-address" statement has only two addresses, like copy
statement a = t5
• Why do We need Copy Instructions like (a = t5) copy t5 into a rather than
assigning t2 + t4 to a directly?
• Each subexpression typically gets its own, new temporary to hold its result, and
only when the assignment operator = is processed do we learn where to put the
value of the complete expression.
Three address code and its quadruple representation

Triples
A triple has only three fields, which we call op, arg1, and arg2.
• Example-1:
• a:= x + y * z
Solution: t0 :=y * z
t1 := x + t0
a := t1
op arg1 arg2
• Example-2: X[i]:= y 0 [ ]= x i
• But this instruction is difficult 1 := 0 y
• It takes two triples
Triples
Triple representations of statement: a = b*- c + b*- c
In the triple representation in Fig. (b), the copy statement a = t5 is

encoded in the triple representation by placing a in the arg1 field and
(4) in the arg2 field.

Indirect Triples
• Indirect triples consist of a listing of pointers to triples, rather than a listing
of triples themselves. i.e. listing pointers are used instead of using statement.
• With indirect triples, an optimizing compiler can move an instruction by reordering
the instruction list, without affecting the triples themselves

Indirect Triples
• Example-2:
• Triple representations of statement: a = b*- c + b*- c
• Let us use an array instruction to list pointers to triples in the desired
order.
To avoid entering temporary names into the symbol

Indirect triples
table, we might refer to a temporary value by the
representation of
position of the statement that computes it
three-address code
Reading assignment
• Declarations
• Declarations in procedures
• Flow of control statements
• Backpatching and Procedure calls

1 Chapter - 5: Intermediate Code Generation Bahir Dar Institute of Technology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Chapter - 5: Intermediate Code Generation Bahir Dar Institute of Technology

Uploaded by

Copyright:

Available Formats

Chapter 5

Intermediate code generation

Chapter – 5 : Intermediate Code Generation 1 Bahir Dar Institute of Technology

• Translates the annotated abstract-syntax tree to intermediate code

Chapter – 5 : Intermediate Code Generation 3 Bahir Dar Institute of Technology

C++ IBM PPC

Chapter – 5 : Intermediate Code Generation 5 Bahir Dar Institute of Technology

C++ IBM PPC

▪ Retargeting - Build a compiler for a new machine by attaching a new

Chapter – 5 : Intermediate Code Generation 7 Bahir Dar Institute of Technology

• some programming languages have well defined

• Directed Acyclic Graphs (DAG) – gives same information but

• Control flow graphs (CFG) – explicitly model control flow

Chapter – 5 : Intermediate Code Generation 9 Bahir Dar Institute of Technology

index of the record for that

Egg. node + has value number

Chapter – 5 : Intermediate Code Generation 11 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 12 Bahir Dar Institute of Technology

• Each block has a sequence of statements

• Three address code:

Note that the statements: minus c appears two

• Tree and DAG Representation

Chapter – 5 : Intermediate Code Generation 16 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 17 Bahir Dar Institute of Technology

6. Unconditional Jump: goto L

Chapter – 5 : Intermediate Code Generation 20 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 21 Bahir Dar Institute of Technology

• A three-address statement is an abstract form of

• Has three representations:

Chapter – 5 : Intermediate Code Generation 22 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 23 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 24 Bahir Dar Institute of Technology

Three address code and its quadruple representation

In the triple representation in Fig. (b), the copy statement a = t5 is

Chapter – 5 : Intermediate Code Generation 27 Bahir Dar Institute of Technology

Chapter – 5 : Intermediate Code Generation 28 Bahir Dar Institute of Technology

To avoid entering temporary names into the symbol

Chapter – 5 : Intermediate Code Generation 30 Bahir Dar Institute of Technology

You might also like