Code Generation Techniques and Issues

Module- 5
Syntax Directed Translation, Intermediate code

generation, Code generation
5.1,5.2,5.3, 6.1,6.2,8.1,8.2
Outline
Code Generation Issues
Target language Issues
Addresses in Target Code
Basic Blocks and Flow Graphs
Introduction
The final phase of a compiler is code generator
It receives an intermediate representation (IR) with
supplementary information in symbol table
Produces a semantically equivalent target program
Code generator main tasks:
 Instruction selection
 Register allocation and assignment
 Insrtuction ordering
Front Code
Code optimizer
end Generator
In the code generation phase,
various issues can arises:
Input to the code generator
Target program
Memory management
Instruction selection
Register allocation
Evaluation order
Explain Issues in the Design of Code
Generator
The most important criterion is that it produces correct code
1.Input to the code generator
IR + Symbol table
We assume front end produces low-level IR, i.e. values of names in it can be
directly manipulated by the machine instructions.
Syntactic and semantic errors have been already detected
The code generation phase needs complete error-free intermediate code as an
input requires.
2.The target program

Common target architectures are: RISC, CISC and Stack based machines
 simple RISC-like computer with addition of some CISC-like addressing modes
The output may be absolute machine language, relocatable machine
language, assembly language.
contd
Absolute machine language as output has
advantages that it can be placed in a fixed memory
location and can be immediately executed.
Relocatable machine language as an output allows
subprograms and subroutines to be compiled
separately. Relocatable object modules can be linked
together and loaded by linking loader. But there is
added expense of linking and loading.
Assembly language as output makes the code
generation easier.
3.Memory Management –
1. Mapping the names in the source program to
the addresses of data objects is done by the front
end and the code generator.
2. A name in the three address statements refers to the
symbol table entry for name.
3. Then from the symbol table entry, a relative
address can be determined for the name.
4. Local variables are stack allocation in the
activation record while global variables are in
static area.
4.Instruction selection –
Selecting the best instructions will improve the
efficiency of the program.
The quality of the generated code can be determined by
its speed and size.
P:=Q+R
Example-inefficient code sequence
S:=P+T
MOV R0,Q
ADD R0, R
MOV P, R0
MOV R0.P
ADD R0,T
MOV S , R0
5.Register allocation issues-
Use of registers make the computations faster in
comparison to that of memory, so efficient utilization
of registers is important.
The use of registers are subdivided into two sub problems:
During Register allocation – we select only those set
of variables that will reside in the registers at each
point in the program.
During a subsequent Register assignment phase, the
specific register is picked to access the variable.
The quality of the generated code can be determined by
its speed and size.
6.Evaluation order
–
The code generator decides the order in which the
instruction will be executed.
1. The order of computations affects the efficiency of
the target code.
2. Among many computational orders, some will
require only fewer registers to hold the
intermediate results
A simple target machine model
Load operations: LD r,x and LD r1, r2
Store operations: ST x,r
Computation operations: OP dst, src1, src2
Unconditional jumps: BR L
Conditional jumps: Bcond r, L like BLTZ r, L
Addressing Modes
variable name: x
indexed address: a(r) like LD R1, a(R2) means
R1=contents(a+contents(R2))
integer indexed by a register : like LD R1, 100(R2)
Indirect addressing mode: *r and *100(r)
immediate constant addressing mode: like LD R1, #100
b = a [i]
LD R1, i //R1 = i
MUL R1, R1, 8 //R1 = Rl * 8
LD R2, a(R1) //R2=contents(a+contents(R1))
ST b, R2 //b = R2
a[j] = c
LD R1, c //R1 = c
LD R2, j // R2 = j
MUL R2, R2, 8 //R2 = R2 * 8
ST a(R2), R1 //contents(a+contents(R2))=R1
x=*p
LD R1, p //R1 = p
LD R2, 0(R1) // R2 = contents(0+contents(R1))
ST x, R2 // x=R2
conditional-jump three-address instruction
If x<y goto L
LD R1, x // R1 = x
LD R2, y // R2 = y
SUB R1, R1, R2 // R1 = R1 - R2
BLTZ R1, M // i f R1 < 0 jump t o M
Generate code for the following three-address statements
assuming all variables are stored in memory locations.
1.x = 1 1. LD R1, #1

2.x = a  ST x, R1
3.x = a + 1
2. LD R1, a
 ST x, R1
3. LD R1, a
 ADD R1, R1, #1
 ST x, R1
4. LD R1, a
4.x = a + b  LD R2, b
5.The two statements  ADD R1, R1, R2
x = b * c
 ST x, R1
y = a + x
5. LD R1, b
 LD R2, c
 MUL R1, R1, R2
 LD R3, a
 ADD R3, R3, R1
 ST y, R3
Generate code for the following three-address
statements assuming a and b are arrays whose element
are 4-byte values.
1.The four-statement  LD R1, i

sequence  MUL R1, R1, #4
 LD R2, a(R1)
 x = a[i]  LD R3, j
 y = b[j]  MUL R3, R3, #4
 a[i] = y  LD R4, b(R3)
 b[j] = x  ST a(R1), R4
 ST b(R3), R2
are 4-byte values.
2.The three-statement  LD R1, i

 LD R2, a(R1)
x = a[i]  LD R1, b(R1)
y = b[i]  MUL R1, R2, R1
z=x*y  ST z, R1
are 4-byte values.
3.The three-statement  LD R1, i

 LD R2, a(R1)
x = a[i]  MUL R2, R2, #4
y = b[x]  LD R2, b(R2)
a[i] = y  ST a(R1), R2
Generate code for the following sequence assuming that
x, y, and z are in memory locations:
 LD R1, x
 if x < y goto L1  LD R2, y
 z=0  SUB R1, R1, R2
 goto L2  BLTZ R1, L1
L1: z = 1  LD R1, #0
 ST z, R1
 BR L2
L1: LD R1, #1
 ST z, R1
Generate code for the following sequence assuming that
n is in a memory location:
 s=0  LD R2, #0
 i=0  LD R1, R2
L1: if i > n goto L2  LD R3, n
 s=s+i L1: SUB R4, R1, R3
 i=i+1  BGTZ R4, L2
 goto L1  ADD R2, R2, R1
L2:  ADD R1, R1, #1
 BR L1
L2:
Generate code for the following three-address sequence
assuming that p and q are in memory locations:
LD R1, q
y = *q LD R2, 0(R1)
q = q + 4 ADD R1, R1, #4
*p = y ST q, R1
p = p + 4 LD R1, p
ST 0(R1), R2
ADD R1, R1, #4
ST p, R1
costs associated with the addressing modes
LD R0, R1 cost = 1

LD R0, M cost = 2
LD R1, *100(R2) cost = 3
Determine the costs of the following instruction
sequences:
1. LD R0, y 3. LD R0, c
LD R1, z LD R1, i
ADD R0, R0, R1 MUL R1, R1, 8
ST x, R0 ST a(R1),R0
2. LD R0, i Answer :
MUL R0, R0, 8 1. 2 + 2 + 1 + 2 = 7
LD R1, a(R0) 2. 2 + 1 + 2 + 2 = 7
ST b, R1 3. 2 + 2 + 1 + 2 = 7
Determine the costs of the following instruction
sequences:
4.LD R0, p 6. LD R0, x
LD R1, 0(R0) LD R1, y
ST x, R1 SUB R0, R0, R1
BLTZ *R3, R0
5. LD R0, p Answer:
LD R1, x 4.2 + 2 + 2 = 6
ST 0(R0), R1 5.2 + 2 + 2 = 6
6.2 + 2 + 1 + 1 = 6
Addresses in the Target Code
A statically determined area Code
A statically determined data area Static
A dynamically managed area Heap
A dynamically managed area Stack

Basic blocks and flow graphs
Partition the intermediate code into basic blocks
The flow of control can only enter the basic block
through the first instruction in the block.
That is, there are no jumps into the middle of the
block.
Control will leave the block without halting or
branching, except possibly at the last instruction in
the block.
The basic blocks become the nodes of a flow graph
Rules for finding leaders
1. The first three-address instruction in the
intermediate code is a leader.
2. Any instruction that is the target of a conditional
or unconditional jump is a leader.
3. Any instruction that immediately follows a
conditional or unconditional jump is a leader.
Consider the following source code for dot product of
two vectors a and b of length 10:
begin
prod :=0;
i:=1;
do begin
prod :=prod+ a[i] * b[i];
i :=i+1;
end
while i <= 10
end
The three address code for the above source program is
given below
(1) prod := 0 begin
(2) i := 1 prod :=0;
(3) t1 := 4* i i:=1;
(4) t2 := a[t1] do begin
(5) t3 := 4* i prod :=prod+ a[i] * b[i];
(6) t4 := b[t3] i :=i+1;
(7) t5 := t2*t4 end
(8) t6 := prod+t5 while i <= 10
(9) prod := t6
end
(10) t7 := i+1
(11) i := t7
(12) if i<=10 goto (3)
Basic block B1 contains the statement (1) to (2)
Basic block B2 contains the statement (3) to (12)
B1 B2
(1) prod := 0 (3) t1 := 4* i
(4) t2 := a[t1]
(2) i := 1
(5) t3 := 4* i
(6) t4 := b[t3]
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=10 goto (3)
Flow Graph
 Flow graph is a directed graph. It contains the
flow of control information for the set of basic
block.
 A control flow graph is used to depict that how
the program control is being parsed among
the blocks. It is useful in the loop optimization.
Block B1 is the initial node.
Block B2 immediately
follows B1, so from B1 to B2
there is an edge.
The target of jump from last
statement of B2 is the first
statement B2, so from B2
to B2 there is an edge.
B2 is a successor of B1 and
B1 is the predecessor of
B2.
Intermediate code to set a 10*10 matrix to
an identity matrix
// element [i,j]
// offset for a[i,j] (8 byte reals)
// program array starts at [1,1]
assembler at [0,0]
1 is a leader by definition. The jumps are 9,

11, and 17. So 10 and 12 are leaders as are
the targets 3, 2, and 13.
The leaders are then 1, 2, 3, 10, 12, and 13.
The basic blocks are therefore {1}, {2},
{3,4,5,6,7,8,9}, {10,11}, {12}, and
{13,14,15,16,17}.
Flow graph based on Basic Blocks
1 is a leader by definition.
The jumps are 9, 11, and 17.
So 10 and 12 are leaders as
are the targets 3, 2, and 13.
The leaders are then 1, 2, 3,
10, 12, and 13.
The basic blocks are therefore
{1}, {2}, {3,4,5,6,7,8,9},
{10,11}, {12}, and
{13,14,15,16,17}.

Code Generation Techniques and Issues

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Code Generation Techniques and Issues

Uploaded by

Copyright:

Available Formats

Module- 5

Syntax Directed Translation, Intermediate code

2.The target program

1.x = 1 1. LD R1, #1

1.The four-statement  LD R1, i

2.The three-statement  LD R1, i

3.The three-statement  LD R1, i

LD R0, R1 cost = 1

A statically determined data area Static

A dynamically managed area Heap

A dynamically managed area Stack

1 is a leader by definition. The jumps are 9,

You might also like