You are on page 1of 32

Module VI

Code Generation

1
Code genaration
• The final phase in compiler model is the code
generator.
• It takes as input an intermediate
representation of the source program and
produces as output an equivalent target
program.
• The code generation techniques presented below
can be used whether or not an optimizing phase
occurs before code generation.

2
Position of a Code Generator in the
Compiler Model

3
ISSUES IN THE DESIGN OF A CODE
GENERATOR(15 marks UQ CUSAT
April 2017)
• The following issues arise during the
code generation phase:
• 1.   Input to code generator
• 2.   Target program
• 3.   Memory management
• 4.   Instruction selection
• 5.   Register allocation
• 6.   Evaluation order
4
1. Input to code generator:
• The input to the code generation consists
of the intermediate representation of
the source program produced by front
end , together with information in the
symbol table to determine run-time
addresses of the data objects denoted by
the names in the intermediate
representation.
5
•  Intermediate representation can be : 
• a.   Linear representation such as postfix
notation
• b.   Three address representation such as
quadruples
• c.   Virtual machine representation such
as stack machine code
• d.   Graphical representations such as
syntax trees and dags.
6
  2. Target program:
• The output of the code generator is the target
program. The output may be :
• a). Absolute machine language
•  
• -  It can be placed in a fixed memory location and can
be executed immediately.
• b). Relocatable machine language
• -  It allows subprograms to be compiled separately.
• C). Assembly language
• - It makes the code generation is made easier.

7
3. Memory management:
•  Names in the source program are mapped to
addresses of data objects in run-time memory
by the front end and code generator. 
•     It makes use of symbol table, that is, a name
in a three-address statement refers to a
symbol-table entry for the name.
•       Labels in three-address statements have to
be converted to addresses of instructions.

8
• j:gotoi generates jump instruction as follows:
•  
• *  if i < j, a backward jump instruction with target
address equal to location of code for quadruple i
is generated.
•  
• *   if i > j, the jump is forward. We must store on
a list for quadruple i the location of the first
machine instruction generated for quadruple j.
When i is processed, the machine locations for all
instructions that forward jumps to i are filled.
9
4. Instruction selection:
• The instructions of target machine
should be complete and uniform.
•    Instruction speeds and machine idioms
are important factors when efficiency of
target program is considered.
•   The quality of the generated code is
determined by its speed and size.

10
Example
•  
• a:=b+c
• d:=a+e
•  
• MOV b,R0
• ADD c,R0
• MOV R0,a
• MOV a,R0
• ADD e,R0
• MOV R0,d

11
5. Register allocation
• Instructions involving register operands are
shorter and faster than those involving
operands in memory. The use of registers is
subdivided into two subproblems :
• 1. Register allocation - the set of variables that
will reside in registers at a point in the
program is selected.
• 2. Register assignment - the specific register
that a value picked
•  Certain machine requires even-odd register
pairs for some operands and results. 12
• For example , consider the division
instruction of the form :Div x, y
  where, x - dividend even register in
even/odd register pair y-divisor 
• even register holds the remainder
•  odd register holds the quotient

13
6. Evaluation order
• The order in which the computations
are performed can affect the
efficiency of the target code.
• Some computation orders require
fewer registers to hold intermediate
results than others.

14
Target Program Code
• The back-end code generator of a
compiler may generate different forms of
code, depending on the requirements:
– Absolute machine code (executable code)
– Relocatable machine code (object files for
linker)
– Assembly language (facilitates debugging)
– Byte code forms for interpreters (e.g. JVM)
15
The Target Machine
• Implementing code generation requires
thorough understanding of the target machine
architecture and its instruction set
• Our (hypothetical) machine:
– Byte-addressable (word = 4 bytes)
– Has n general purpose registers R0,
R1, …, Rn-1
– Two-address instructions of the form
op source, destination 16
The Target Machine: Op-codes and
Address Modes
• Op-codes (op), for example
MOV (move content of source to destination)
ADD (add content of source to destination)
SUB (subtract content of source from dest.)
• Address modes
Mode Form Address Added Cost

Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1

Indirect register *R contents(R) 0

Indirect contents(c+contents(R
*c(R) 1
indexed )) 17
Instruction Costs
• Define the cost of instruction
= 1 + cost(source-mode) + cost(destination-
mode)
• Eg: The instruction MOV R0,R1 copies the contents
of register R0 into register R1.
• This instruction has cost 1, since it occupies only one
word of memory.
• The instruction MOV R5, M , copies the contents of
register R5 into memory location M. This instruction
has cost 2, since the address of memory location M is
in the word following the instruction.
18
Examples

Instruction Operation Cost


MOV R0,R1 Store content(R0) into register R1 1
MOV R0,M Store content(R0) into memory location M 2
MOV M,R0 Store content(M) into register R0 2
MOV 4(R0),M Store contents(4+contents(R0)) into M 3
MOV *4(R0),M Store contents(contents(4+contents(R0))) into M 3
MOV #1,R0 Store 1 into R0 2
ADD 4(R0),*12(R1) Add contents(4+contents(R0))
to contents(12+contents(R1)) 3

19
Instruction Selection
• Instruction selection is important to obtain efficient
code
• Suppose we translate three-address code
x:=y+z
to: MOV y,R0
ADD z,R0
MOV R0,x a:=a+1 MOV a,R0
ADD #1,R0
MOV R0,a
Cost = 6
Better Better

ADD #1,a INC a


Cost = 3 Cost = 2 20
Instruction Selection: Utilizing
Addressing Modes
• Suppose we translate a:=b+c into
MOV b,R0
ADD c,R0
MOV R0,a
• Assuming addresses of a, b, and c are stored in R0,
R1, and R2
MOV *R1,*R0
ADD *R2,*R0
• Assuming R1 and R2 contain values of b and c
ADD R2,R1
MOV R1,a

21
Need for Global Machine-Specific
Code Optimizations
• Suppose we translate three-address code
x:=y+z
to: MOV y,R0
ADD z,R0
MOV R0,x
• Then, we translate
a:=b+c
d:=a+e
to: MOV a,R0
ADD b,R0
MOV R0,a
MOV a,R0 Redundant
ADD e,R0
MOV R0,d

22
Register Allocation and Assignment
• Efficient utilization of the limited set of registers is
important to generate good code
• Registers are assigned by
– Register allocation to select the set of variables that will
reside in registers at a point in the code
– Register assignment to pick the specific register that a
variable will reside in
• Finding an optimal register assignment in general is
NP-complete

23
Example

t:=a+b t:=a*b
t:=t*c t:=t+a
t:=t/d t:=t/d

{ R1=t } { R0=a, R1=t }

MOV a,R1 MOV a,R0


ADD b,R1 MOV R0,R1
MUL c,R1 MUL b,R1
DIV d,R1 ADD R0,R1
MOV R1,t DIV d,R1
MOV R1,t 24
Choice of Evaluation Order
• When instructions are independent, their evaluation
order can be changed

MOV a,R0
ADD b,R0
MOV R0,t1
t1:=a+b MOV c,R1
t2:=c+d ADD d,R1
a+b-(c+d)*e MOV e,R0
t3:=e*t2
t4:=t1-t3 MUL R1,R0 MOV c,R0
MOV t1,R1 ADD d,R0
reorder SUB R0,R1 MOV e,R1
MOV R1,t4 MUL R0,R1
t2:=c+d MOV a,R0
t3:=e*t2 ADD b,R0
t1:=a+b SUB R1,R025
t4:=t1-t3 MOV R0,t4
A Simple Code Generator
• A code generator generates target code for a
sequence of three- address statements and
effectively uses registers to store operands of
the statements.
• For example: consider the three-address
statement a := b+c It can have the following
sequence of codes:
 

26
A Simple Code Generator

• ADD Rj, Ri Cost = 1


(or)
• ADD c, Ri Cost = 2
(or)
• MOV c, Rj Cost = 3
• ADD Rj, Ri
27
Register and Address Descriptors:
• A register descriptor is used to keep track
of what is currently in each registers. The
register
descriptors show that initially all the
registers are empty.
• An address descriptor stores the location
where the current value of the name can
be found at run time.
28
A code-generation algorithm:
• The algorithm takes as input a sequence of
three-address statements constituting a basic
block. For each three-address statement of
the form x : = y op z, perform the following
actions:
• 1.   Invoke a function getreg to determine the
location L where the result of the
computation y op z should be stored.

29
2. Consult the address descriptor for y to
determine y’, the current location of y.
• Prefer the register for y’ if the value of y is
currently both in memory and a register.
• If the value of y is not already in L,
generate the instruction MOV y’ , L to
place a copy of y in L.

30
3.    Generate the instruction OP z’ , L
where z’ is a current location of z.
• Prefer a register to a memory location if
z is in both.
• Update the address descriptor of x to
indicate that x is in location L.
• If x is in L, update its descriptor and
remove x from all other descriptors.
31
4. If the current values of y or z have
no next uses, are not live on exit
from the block, and are in registers,
alter the register descriptor to
indicate that, after execution of x : =
y op z , those registers will no longer
contain y or z
32

You might also like