You are on page 1of 23

3/10/2015

CS 4110 Compiler Design


Code Generation

http://www.labouseur.com/courses/compilers/compilers/alan/

Instructor: Jiaofei (Fay) Zhong, Email: jiaofei.zhong@csueastbay.edu

Run Time Environment

3/10/2015

Storage Allocation
Storage allocation roles
OS: allocate physical memory to virtual memory

Compiler generate code for a target machine


Allocate static storage based on offset
E.g., program, global variables, all are fully allocated in the virtual
address space
Allocate stack storage at run time
E.g., for procedure calls, the code for computing the size of the
storage requirement, the layout of the parameters and variables, the
copying of parameters, etc. are all generated by compiler
Allocate heap storage at run time
Dynamically allocated memory, for statements such as malloc and
new
Compiler generates code to compute the size of the storage
requirement and allocate the storage in virtual memory

Storage Allocation
Code

Low address

Parameter &
Return value
Control link &
Saved status

Static data

Activation
Record

Local temps

Stack
Stack ptr

Parameter &
Return value
Control link &
Saved status

Heap

Local temps

High address
4

3/10/2015

Procedure Calls
Storage for procedure calls
Storage is allocated when the procedure is called
When exiting the procedure, the storage is deallocated
The region dynamically created for a procedure is called

an activation record
Activation record is mostly stack based
Some use heap storage
So that the deallocation can be flexible
In case there are static information in the procedure that needs to

be preserved longer than the life of the procedure call


Storage of the callee does not have to be deallocated before that
for the caller
5

Procedure Calls
Activation record
Return value
Size is defined by the type of the function
Keep at the top area so that the address can be easily determined
and the caller can access the region easily

Actual parameters
Caller copies its local values to this region
Callee accesses this region to get the input data
Accessed
For output parameters, reverse the action

Return
Values

Control link
Pointers to the AR of the caller
Return address
Stored by the caller
Other control pointer

Control
Links
Local
Data

Actual

by caller
Parameters
and callee
Accessed
only by
callee

3/10/2015

Procedure Calls
Activation record

Accessed

Return
Values

Actual
by caller
Local data
and callee Parameters
All data declared in the procedure
Control
All temporary variables generated in the procedure
Links
Accessed
Local
Dynamic data are in the heap
only by
Data
callee
But the pointers are in the AR (stack)

Calling sequence

P calls Q, Q calls R
ARs are kept track of in the stack

Ps AR
Qs AR

top of stack

Rs AR

Procedure Calls
Nested procedures
Some languages supports nested procedure definitions
Scope problem
P1 int x, y
P2 int a, b
int m, n
use m
P3 use a
use x

P1
P4
P5
P2
P5
P2
P3

P4

P3 uses a, which is defined in P2


There are two ARs for P2
Which one should be used?
P3 uses x, which is defined in P1
How to locate P1?
Search down the stack?

P5
P6
8

3/10/2015

Procedure Calls
Access links
To keep track of where to access non-local data
Linked to the closest scope parent, not the caller
One of the control links

P1 int x, y
P2 int a, b
int m, n

P1
P5

use m
P3 use a
use x

P4
P2
P5
P2
P3

To use a, P3 follows the access link


to the closest P2 and uses P2s a.

P4
P5

To use x, P3 follows the access link


to P2 and then to P1 and uses P1s x.

P6

Memory Management
Compiler deal with virtual address space
Stack, heap, and static storage are all in virtual address space
Map virtual space to physical memory
OS issues
Some issues that compiler can help
Reduce stack space
Stack generally stay in main memory
Garbage collection
Virtual address space may also run out
More consumption in virtual space may result in higher potential of
page faults
Garbage collection is program dependent (need data analysis) and
need compiler to generate code to do the job
10

3/10/2015

Garbage Collection
Principle
Should be safe, be conservative
so that no damage to useful data

Problem
How can we tell whether an object is garbage now?

Approach
Reachability analysis
A program can only use the objects it can reference
An object that can no longer be reached from the program is garbage
How to determine reachability?

11

Garbage Collection
How to determine whether an object is reachable
Can check through roots and their references recursively
The roots for referencing any object are in
Static memory: static/global pointers define in the program
Registers: store the state of temporary pointers
Stack: temporary pointers defined in the procedures

Type safety
Some languages are type safe (e.g., Java)
Can safely determine the reachability of objects by checking
through objects with reference (pointer) type

12

Some languages are type unsafe (e.g., C)


Any object could turn to be a reference
Need to be more careful, need to check all objects that could
potentially be a reference

3/10/2015

Reference Counting
Try to do incremental garbage collection
Rather than waiting for the memory to be exhausted, try to

reclaim an object when there are no more references to it


How to do that?
Associate a reference count to each object
Indicate the number of pointers that are pointing to this object
Update the reference count when there are reference changes

13

Reference Counting
Change of references
Object allocations
Via memory management calls: malloc, new, etc.
Set the reference count of the new object to 1
Reference assignment: p := q
Decrease the reference count of the object originally pointed by p
Increase the reference count of the object referenced by q
Procedure calls
References to objects may be passed from actual to formal parameters
Increase the reference count of each reference object passed to the
procedure

14

3/10/2015

Reference Counting
Change of references
Procedure returns
All objects referenced by local/temporary variables in the frame are now
unreachable, unless they are referenced by multiple references
References to objects may be passed from the returned objects to the
caller
But the reference count does not change, just got transferred
For each object referenced in the AR, decrement its reference count
Transitive rule
When an object os reference count becomes 0
For each object referenced by o, decrement its reference count

15

Reference Counting
Good points
Done incrementally, does not need to halt program

execution
Easy to implement
Problems
Cannot handle circular references
Need to update reference count for each reference

assignment
Very expensive
Rarely used in real systems

16

3/10/2015

Copying Collector
Principle
Use two memory heaps
One in use by the program
The other sits idle

GC
Assume that now A is in use and B is sitting idle
When A is running out space
Copy all reachable objects from A to B
Unreachable objects are automatically discarded
Switch heap after copying (A becomes idle and B is in use)

17

Copying Collector
Good points
Simple
Automatically eliminates fragmentation
Can have simpler malloc implementation
Since the memory is going to be compacted, just allocate the top of
the heap to new objects instead of keeping track of all free slots

Problem
When copying, each reference needs to be updated since

the object location has been changed


Precise pointer information is required

Twice of the memory usage


18

3/10/2015

Generational GC
An important observation
In a long-running system
If an object has been reachable for a long time, it is likely to

remain so
Most of the new objects become garbage shortly
Statistics: Less than 10% stays alive

Principle of the approach


Assign heap objects to different generations: G0, G1, G2,
Scan for garbage on newer generations much more frequently

than on older generations


Consider two generations
G0: new objects, G1: tenured objects

19

Generational GC
Remember set
Avoid scanning everything in tenured set
In practice, tenured objects are unlikely to point to new objects and
new objects are unlikely to point to tenured objects
Compiler insert extra code to catch modifications to tenured objects
When a tenured object is modified to point to a new object, it is put
into the remember set

Algorithm
When collect garbage in G0
Roots for GC: registers, stacks, G0, and remember set of G1
While collecting garbage in G0, record references to G1
Periodically switch objects from G0 to G1
Occasionally collect garbage in G1
Root for GC: registers, stacks, G1, and part of G0 that references G1

20

10

3/10/2015

Generational GC
Good points
Much more efficient
Garbage collection can be done generation by generation
Generally more than 2 generations can be considered
Avoid large pauses
The cross references among different generations are recorded
Unlikely to have a large remember set in practice
So GC in each generation can be done almost independently

21

Run Time Environment -- Summary


Activation records
Nested procedures
Garbage collection methods

22

11

3/10/2015

Code Generation

Code Generation
Use registers during execution
Whenever possible, perform computation in registers
Memory load/store are much more expensive

Need to determine the best register allocation


For a given number of registers, minimize the number of spills
Spill: When run out of registers, store some registers to memory

Need to determine the best order of instruction execution


To satisfy the suboptimal register allocation decision
To reduce the number of instructions

Instruction selection
Map the intermediate code to the set of machine instructions

that minimizes the cost of execution

Peephole optimization

24

12

3/10/2015

Code Generation
Various methods for register allocation and

instruction scheduling
Tree

Achieve optimal register allocation and instruction

scheduling

DAG (directed acyclic graph)


Achieve local subexpression elimination (optimal)
Optimal register allocation and instruction scheduling is NP
Heuristic algorithms

Global
Global register allocation
Do not have corresponding scheduling algorithm, just follow

the original instruction order

25

Tree Based Approach for a Basic Block


Basic block:
t1 := a + b
t2 := c * d
t3 := e + f
t4 := t2 + t3
y := t1 * t4

Assumptions:
The system has two registers, r0, r1
only y is alive at the exit of the block
op reg reg/mem reg -- first reg is the result
a b c a := b c

15 instructions
10 load, 5 store

load r0, a
add r0, b, r0
store t1, r0
load r0, c
mul r0, d, r0
store t2, r0
load r0, e
add r0, f, r0
store t3, r0
load r0, t2
add r0, t3, r0
store t4, r0

Can we use the registers more effectively?

load r0, t1
mul r0, t4, r0
store y, r0

26

13

3/10/2015

Tree Based Approach for a Basic Block


Assumptions:
The system has two registers, r0, r1
only y is alive at the exit of the block

Basic block:
t1 := a + b
t2 := c * d
t3 := e + f
t4 := t2 + t3
y := t1 * t4

t1 (R0) and t2 (R1) are


still needed
But no more registers to
compute t3
Has to spill (choose R0)

* y

+ t3

* t2

b
c

27

Need to load t1 back


into R1

+ t4

+ t1
a

load r0, a
add r0, b, r0

load r1, c
mul r1, d, r1
store t1, r0
load r0, e
add r0, f, r0
add r0, r1, r0
load r1, t1
mul r0, r0, r1
store y, r0

11 instructions
7 load, 2 store (1 spill)

Tree Based Approach for a Basic Block


Assumptions:
The system has two registers, r0, r1
only y is alive at the exit of the block

Basic block:
t1 := a + b
t2 := c * d
t3 := e + f
t4 := t2 + t3
y := t1 * t4

load r1, e
add r1, f, r1

Can we always
achieve optimal
execution?

* y

28

+ t3

* t2

b
c

add r1, r0, r1


load r0, a
add r0, b, r0
mul r0, r0, r1
store y, r0

+ t4

+ t1
a

load r0, c
mul r0, d, r0

9 instructions
6 load, 1 store (0 spill)
Optimal!

14

3/10/2015

Tree based Register Allocation and


Scheduling
Construct the execution tree for a basic block
Label the tree to obtain the register requirements
Depth first labeling

Assumptions:
From here onwards,
3 address code need to be:
op reg reg reg

L(leaf) = 1 if it is an identifier
L(leaf) = 0 if it is a constant
L(nonleaf node) =
If L(left child) = L(right child) then

L(current node) := L(left child) + 1

Otherwise

L(current node) := max (L(left child), L(right child))

Assign registers and generate code


Register allocation
Instruction scheduling follows the register allocation algo
29

Tree based Register Allocation and Scheduling


load r3, f

Compute register
requirement

load r2, e
add r3, r2, r3

mul r2, r1, r2

add r2, r1, r2

add r3, r2, r3

load r1, a

add r3, r2, r3


(r1,r2,r3)

a 1
(r1)

load r1, a

mul r2, r1, r2


(r1,r2)

add r2, r1, r2


mul r3, r2, r3

(r1, r2)

load r2, b

Generate code

now r1, r2 are


available

(r1,r2,r3)

load r1, c

30

Assign registers

mul r3, r2, r3

load r2, d

b 1

add r3, r2, r3


(r2,r3)

(r2)
load r2, b

c1
(r1)
load r1, c

d1
(r2)
load r2, d

e 1

f 1

(r2)
load r2, e

(r3)
load r3, f

15

3/10/2015

Global Register Allocation


Basic approach
Global liveliness analysis
Build the interference graph
Graph coloring
N colors
N is the number of available registers
If N-coloring is not possible
Insert spill code to the program

31

Global Register Allocation


Build the interference graph
Show which variables interfere with each other

Principle:
Two variables that are alive simultaneously

interfere
They cannot be allocated to the same register
Register interference graph:
One vertex for each variable in the graph
At each point p in the CFG
L is the Live set at p
Two variables x and y are in L, then x should not get
the same register as y
add an edge (x,y)
32

16

3/10/2015

Global Register Allocation


Coloring the example graph with 4 colors
Simplification step
a, b, d have
< 4 edges.
Choose a

Now all nodes have < 4 edges,


remove them in arbitrary order

e
b
c

c
d

b, d have
< 4 edges.
Choose d

top

stack

33

Global Register Allocation


Coloring the example graph with 4 colors
Selection step
a
f
e

b
c
c

d
a

stack

34

17

3/10/2015

Global Register Allocation


Coloring the example graph with 3 colors
After removing a,
No node has < 3 edges
Algorithm fails!!!

35

Global Register Allocation


Coloring algorithm failure (for k colors)
Does not imply it is not possible to color with K colors
Always try to color anyway
Example: color the graph with 3 colors
Color the node with the
highest degree first.
The remaining nodes
has the same degree.
Choose any to color.
After removing a,
No node has < 3 edges.
Algorithm fails!
36

a had degree 2, no
problem to color!

Still can find a color


for this node!

Still can find a color


for this node!

18

3/10/2015

Spill
When no way is found to color with k colors
Choose one node to spill
Continue to spill if necessary, till a node can be removed
For each spilled node
For each definition, store the value
For each use, load the value

Where to load the value, need a register anyway


Naive approach
Always keep extra registers for shuffling data in and out
What a waste!!!

Rewrite code
Use a new temporary variable for each load, it will have

very short life and likely to have very few outgoing edges
Redo liveliness analysis and register allocation
37

Global Register Allocation


Code generation
For each statement
Replace variables by registers
If a variable is from external, then it should be
loaded to the register first
For the spilled variables
Load to reserved registers if the rewrite code
approach is not used
Store the live variables
No need to store temporary variables
Variables that are alive after the CFG should be
stored to memory
38

19

3/10/2015

DAG Construction
Versioning
a := b c
b := a + d
d := b c
a := a * d
b := b c

a1 := b0 c0
b1 := a1 + d0
d1 := b1 c0
a2 := a1 * d1
b2 := b1 c0

Goal:
If a variables is redefined, it is no longer
the same as the previous version.
Use a version number to avoid confusion.
Method:
Use the table to keep track of the variables
and their version numbers.
Initialize the version number to 0.
Increase the version number each time
the variable is defined.

39

DAG Construction
Dag
Leaves are identifiers or constants

t1, t3
b0

c0

Internal nodes are operators and a list of labels


Labels are identifiers

Maintain a table of variables


Each entry in the table points to the corresponding node

in the dag

If there are multiple versions, point to the highest version

Facilitate search and version tracking


Later, when we say find node t1
Go to the table search for t1, follow the link and go to the

node
The node has t1 in its label list
The node has the highest version of t1
40

20

3/10/2015

DAG Construction
For a copy statement x := y
Find node x
If nonexistent

Insert x into the table with version 0

If existing, if the old x has a version number v, then give

the new x a version number v+1


No need to worry about variables defined on multiple
paths, since we only consider a simple block

Find node y
If nonexistent, create it (could be from external)

Create a leaf node N and put y in its label list


Insert y into the table
If existing, assume the node is N

N y, x

Add x to the list of labels of N and update the table

pointer

41

DAG Construction
For statement x := y op z
Find node x (do the same as the previous case)
Check whether <op node(y) node(z)> exists
If so, let N be the root node of the subtree for <op
node(y) node(z)>
If not, check the operands and create the subtree
Find node y, if nonexisting, create node(y)
Find node z, if nonexisting, create node(z)
Create the node in table
Create a leaf node in the dag
Create the operator node, say N, and link it to node(y)
and node(z)
Add x to the list of labels of N and update the

table pointer

42

21

3/10/2015

DAG Construction
Dag construction example
a
b
d
a
b
e

:=
:=
:=
:=
:=
:=

bc
a+d
bc
a*d
bc
a*b

43

a1
b1
d1
a2
e1
b2

:=
:=
:=
:=
:=
:=

b0 c0
a1 + d0
b1 c0
a1 * d1
a2 * d1
d1

* e1
a2

d1 , b2

b1
+
a1
b0

d0
c0

With the construction of dag,


common subexpressions are
automatically eliminated

The last stmt may be eliminated using


CFG based copy stmt elimination

Instruction Selection
Goal
Determine parts of the tree that can match the
instruction tiles
Desirable to achieve optimal tiling
Get the instruction set with least cost (not easy)
The maximal munch algorithm (greedy)
Start from the tree root and find all matching tiles
Select the one with the maximum number of nodes

Can consider other criteria that include the cost of the


instruction

Go to the children and apply the algorithm recursively


Until the tree is fully covered
44

22

3/10/2015

Code Generation -- Summary


Read Textbook
Run time storage allocation
Register allocation and instruction scheduling

45

23

You might also like