Professional Documents
Culture Documents
Chapter Five
Chapter Five
Benefits
1. Retargeting is facilitated
2. Machine independent Code Optimization can be applied.
Intermediate Code
Intermediate codes are machine independent codes, but they are close to machine instructions.
The given program in a source language is converted to an equivalent program in an
intermediate language by the intermediate code generator.
Intermediate language can be many different languages, and the designer of the compiler
decides this intermediate language.
Syntax trees can be used as an intermediate language.
Postfix notation can be used as an intermediate language.
three-address code (Quadruples) can be used as an intermediate language
we will use quadruples to discuss intermediate code generation
Quadruples are close to machine instructions, but they are not actual machine
instructions.
Postfix form
Example
a+b ab+
(a+b)*c ab+c*
a+b*c abc*+
a:=b*c+b*d abc*bd*+:=
(+) simple and concise
Observe that given the syntax-tree or the dag of the graphical representation we can easily
derive a three address code for assignments as above.
Quadruples
A quadruple is:
x := y op z
But we may also the following notation for quadruples (much better notation because it looks like a
machine code instruction)
op y,z,x
We use the term “three-address code” because each statement usually contains three addresses (two
for operands, one for the result).
Example:
t1:=- c
t2:=b * t1
t3:=- c
t4:=b * t3
t5:=t2 + t4
a:=t5
op arg1 arg2 result
(0) uminus c t1
(1) * b t1 t2
(2) uminus c
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a
Temporary names must be entered into the symbol table as they are created.
Three-Address Statements
Binary Operator:
op y,z,result or result := y op z
where op is a binary arithmetic or logical operator. This binary operator is applied to y and z, and the
result of the operation is stored in result.
op y,result or result := op y
where op is a unary arithmetic or logical operator. This unary operator is applied to y, and the result of
the operation is stored in result.
Triples
A triple has only three fields, which we call op, arg,, and arg2. Note that the result field in Fig. is used
primarily for temporary names. Using triples, we refer to the result of an operation x op y by its
position, rather than by an explicit temporary name. Thus, instead of the temporary t1 in Fig, a triple
op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
Indirect Triples
Indirect triples consist of a listing of pointers to triples, rather than a listing of triples themselves. For
example, let us use an array instruction to list pointers to triples in the desired order. Then, the
triples in Fig. might be represented as in Fig. With indirect triples, an optimizing compiler can move
an instruction by reordering the instruction list, without affecting the triples themselves.
Tradeoffs:
1) Performance vs. Size
2) Compilation speed and memory
> There is no perfect optimizer
Register Allocation
— Temporary variables
Instruction Selection
> For every expression, there are many ways to realize them for a processor
Peephole Optimization
— table pre-computed
— Structure Simplifications
Constant Folding
Constant Propagation
— If no change of c between!
Copy Propagation
> Replace later uses of x with y, if x and y have not been changed.
Algebraic Simplifications
Strength Reduction
Dead Code
Loop Optimizations
Advanced Optimizations
— Auto parallelization
— Profile-guided optimization
> Vectorization
Iterative Process
Code generation
The final phase of a compiler is code generation
It receives an intermediate representation (IR) with supplementary information in symbol table
Produces a semantically equivalent target program
Code generator main tasks:
o Instruction selection
o Register allocation and assignment
o Instruction ordering
⚫ IR + Symbol table
⚫ We assume front end produces low-level IR, i.e. values of names in it can be directly
manipulated by the machine instructions.
⚫ Common target architectures are: RISC, CISC and Stack based machines
Instruction Selection
The code generator must map the IR program into a code sequence that can be executed by
the target machine. The complexity of performing this mapping is determined by factors
such as
the level of the IR
the nature of the instruction-set architecture
the desired quality of the generated code.
For example, every three-address statement of the form x = y + z, where x, y, and z are statically
allocated, can be translated into the code sequence
This strategy often produces redundant loads and stores. For example, the sequence of three-
address statements would be translated into
a=b+c
d=a+e
LD Ro, b // Ro = b
ADD Ro, Ro, c // Ro = Ro + c
ST a, Ro // a = Ro
LD Ro,Y a // Ro = a
ADD Ro, Ro, e // Ro = Ro + e
ST d, Ro // d = Ro
Here, the fourth statement is redundant since it loads a value that has just been stored, and so
is the third if a is not subsequently used.
The quality of the generated code is usually determined by its speed and size. On most
machines, a given IR program can be implemented by many different code sequences, with
significant cost differences between the different implementations.
Register Allocation
A key problem in code generation is deciding what values to hold in what registers.
Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values. Values not held in registers need to reside in
memory. Instructions involving register operands are invariably shorter and faster than
those involving operands in memory, so efficient utilization of registers is particularly
important.
1. Register allocation, during which we select the set of variables that will reside in registers
at each point in the program.
2. Register assignment, during which we pick the specific register that a variable will reside
in.
3. Complications imposed by the hardware architecture
In this section, we shall consider an algorithm that generates code for a single basic block. It
considers each three-address instruction in turn, and keeps track of what values are in what
registers so it can avoid generating unnecessary loads and stores.
One of the primary issues during code generation is deciding how to use registers to best
advantage. There are four principal uses of registers: