You are on page 1of 48

18CSC304J- COMPILER DESIGN

UNIT-5

SRMIST, Vadapalani Campus


UNIT-V SYLLABUS
1. Code optimization- Introduction 10. Introduction to Global Data Flow Analysis
2. Principal Sources of Optimization 11. Computation of gen and kill
3. Function Preserving Transformation 12. Computation of in and out
4. Loop Optimization 13. Parameter Passing
5. Optimization of basic Blocks 14. Runtime Environments
6. Building Expression of DAG 15. Source Language issues
7. Peephole Optimization 16. Storage Organization
8. Basic Blocks, Flow Graphs 17. Activation Records
9. Next -Use Information 18. Storage Allocation strategies

2
SRMIST, Vadapalani Campus
INTRODUCTION TO GLOBAL DATA
FLOW ANALYSIS
● In order to do code optimization and a good job of code generation ,
compiler needs to collect information about the program as a whole
and to distribute this information to each block in the flow graph.
● A compiler could take advantage of “reaching definitions” , such as knowing
where a variable like debug was last defined before reaching a given block,
in order to perform transformations are just a few examples of data-flow
information that an optimizing compiler collects by a process known as
data-flow analysis.
● Data-flow information can be collected by setting up and solving systems of
equations of the form :
3
SRMIST, Vadapalani Campus
● This equation can be read as “ the information at the end of a statement is
either generated within the statement , or enters at the beginning and is not
killed as control flows through the statement.” such equations are called
data-flow equations
● The details of how data-flow equations are set and solved depend on 3 factors.
○ The notions of generating and killing depend on the desired information, i.e.,
for some problems, instead of proceeding along with flow of control and
defining out[s] in terms of in[s], we need to proceed backwards and define in[s]
in terms of out[s].
○ Since data flows along control paths, data-flow analysis is affected by the
constructs in a program. i.e., write out[s], unique end point where control
leaves the statement;
○ There are subtleties that go along with such statements as procedure calls,
assignments through pointer variables, and even assignments to array
variables

4
SRMIST, Vadapalani Campus
5
SRMIST, Vadapalani Campus
Points and Paths
● Within a basic block, we talk the point between two adjacent statements, as well as
the point before the first statement and after the last.
● Thus, block B1 has four points: one before any of the assignments and one after each
of the three assignments.

Example: There is a path from the beginning of block B5 to the


beginning of block B6. It travels through the end point of B5 and
then through all the points in B2, B3, and B4, in order, before it
reaches the beginning of B6

6
SRMIST, Vadapalani Campus
Reaching Definition
● A definition of variable x is a statement that assigns, or may assign, a value to x.
● Most common forms of definition are assignments to x and statements that read a
value from an i/o device and store it in x.
● These statements certainly define a value for x, and they are referred to as
unambiguous definitions of x.
● There are certain kinds of statements that may define a value for x; they are called
ambiguous definitions. The most usual forms are
○ A call of a procedure with x as a parameter or a procedure that can access x
because x is in the scope of the procedure.
○ An assignment through a pointer that could refer to x, the assignment *q: = y is
a definition of x if it is possible that q points to x
● A definition d reaches a point p if there is a path immediately following d to p, such
that d is not “killed” along that path.
● a killed 🡪 if two points along path then there is a definition of d

7
SRMIST, Vadapalani Campus
8
SRMIST, Vadapalani Campus
Data-flow analysis of structured programs:
● Flow graphs for control flow constructs such as do-while statements have a useful
property:
“there is a single beginning point at which control enters and a single end point that
control leaves from when execution of the statement is over”.

● Expressions in this language are similar to intermediate code, but the flow graphs for
statements have restricted forms.
● We define a portion of a flow graph called a region to be a set of nodes N that
includes a header, which dominates all other nodes in the region.
● All edges between nodes N are in the region, except for some that enter the
header.
● The portion of flow graph corresponding to a statement S is a region that obeys the
further restriction that control flow to just one outside block when it leaves the
9
region. SRMIST, Vadapalani Campus
We say that the beginning points of the dummy
blocks at the entry and exit of a statement’s
region are the beginning and end points,
respectively, of the statement.

The equations are inductive, or syntax-directed,


definition of the sets in[S], out[S], gen[S],
and kill[S] for all statements S.

gen[S] and kill[S] are synthesized attributes

gen[S] is the set of definitions “generated” by S

while kill[S] is the set of definitions that never


reach the end of S.

10
SRMIST, Vadapalani Campus
Computation of gen, kill, in, out

Example:
Consider the following
data-flow equations for
reaching definitions :

Fig:
data-flow
equations
for
reaching
definitions

11
SRMIST, Vadapalani Campus
Conservative estimation of data-flow information:
● There is a subtle miscalculation in the rules for gen and kill.
● We have made the assumption that the conditional expression E in the if
and do statements are “uninterpreted”; that is, there exists inputs to the
program that make their branches go either way.
● We assume that any graph-theoretic path in the flow graph is also an
execution path, i.e., a path that is executed when the program is run with
least one possible input.
● When we compare the computed gen with the “true” gen we discover that
the true gen is always a subset of the computed gen. on the other hand, the
true kill is always a superset of the computed kill.

12
SRMIST, Vadapalani Campus
Computation of in and out:
● Assuming we know in[S] we compute out by equation, that is
Out[S] = gen[S] U (in[S] - kill[S])
● Considering cascade of two statements S1; S2, as in the second case. We
start by observing in[S1]=in[S]. Then, we recursively compute out[S1], which
gives us in[S2], since a definition reaches the beginning of S2 if and only if it
reaches the end of S1. Now we can compute out[S2], and this set is equal to
out[S].
● Considering if-statement we have conservatively assumed that control can
follow either branch, a definition reaches the beginning of S1 or S2 exactly
when it reaches the beginning of S.
In[S1] = in[S2] = in[S]
● If a definition reaches the end of S if and only if it reaches the end of one or
both sub statements; i.e,
Out[S]=out[S1] U out[S2] 13
SRMIST, Vadapalani Campus
14
SRMIST, Vadapalani Campus
15
SRMIST, Vadapalani Campus
Problems of data flow analysis
1. Reaching definition
● The definition that may reach a program point along some path are know as
reaching definition
● i.e., A definition d reaches a point p if there is a path from the point immediately
following d to p, such that d is not “killed” along that path
2. Live variable analysis
● To know for variable x and point p whether the value of x at p could be used along
some path in the flow graph starting at p
If so we say x is live at p
otherwise
x is dead at p
3. Available expression
● An expression x+y is available at a point p if every path from the entry node at p
evaluates x+y prior to reaching p there are no subsequent assignment to x or y
16
SRMIST, Vadapalani Campus
RUNTIME ENVIRONMENTS
● Runtime means a program in execution
● As execution proceeds, the same name in the source text can denote
different data objects in the target machine
● Data objects at run time is determined by its type
● Program needs memory resources to execute instructions
● It takes care of memory allocation and de-allocation
● The design of runtime support package is influenced by the semantics of
procedure (functions)
● Each execution of a procedure is referred to as an activation of the
procedure.
● If the procedure is recursive, several of its activations may be alive at the
same time.
17
SRMIST, Vadapalani Campus
SOURCE LANGUAGE ISSUES
The source language distinguishes between the source text of a procedure and its
activations at run time
1. Procedures
● A program is made up of procedure
● A Procedure is a declaration that associates an identifier with a statement
Identifier is a Procedure name and
Statement is a Procedure body
● Procedure call-Procedure name appears within an executable statement
● Identifiers appearing in a procedure definition are special, and are called formal
parameters. (C calls them "formal arguments" and Fortran calls them "dummy
arguments, ")
● Arguments, known as actual parameters may be passed to a called procedure; they
are substituted for the formals in the body
18
SRMIST, Vadapalani Campus
19
SRMIST, Vadapalani Campus
Example-2:
Pascal
program for
reading and
sorting
integers
(quicksort)

20
SRMIST, Vadapalani Campus
2. Activation Trees
● Assumptions among procedures during the execution of a program:
1. Control flows sequentially;
2. Each execution of a procedure starts at the beginning of the procedure body
and eventually returns control to the point immediately following the place
where the procedure was called.
● Each execution of a procedure body Is referred to as an activation of the
procedure
● We can represent the activation of procedures during the running of an
entire program by a tree called activation tree
● Each node corresponds to one activation and the root is the activation of
the main procedure that initiates execution of the program
● The activations are called from left to right
● One child must finish before the activation to its right can begin

21
SRMIST, Vadapalani Campus
● Activation tree shows the way control enters and leaves activations of the
procedures
● If a and b are two procedures their activation will be
Non-overlapping when one procedure is called after other
Nested – nested procedure
● In recursive procedure – new activation begins before an earlier activation
of the same procedure has ended
● In a activation tree
○ Each node represents an activation of a procedure
○ The root represents the activation of the main program
○ The node for ‘a’ is the parent of node for ‘b’ if and only if control flows from
activation ‘a’ to ‘b’
○ The node for ‘a’ is to the left of node ‘b’ if and only if the lifetime of ‘a’ occurs
before lifetime of ‘b’

22
SRMIST, Vadapalani Campus
Example-1: Activation tree of quick sort using C

23
SRMIST, Vadapalani Campus
Example-2: Activation tree of quick sort using Pascal

Note: For exam draw any one activation tree

24
SRMIST, Vadapalani Campus
3. Control stack
● The flow of control in a program corresponds to a depth-first traversal of
the activation tree that starts at the root, visits a node before its children,
and recursively visits children at each node in a left-to-right order.
● Whenever a procedure is executed, its activation record is stored on the
stack also known as control stack
● The contents of the control stack are related to paths to the root of the
activation tree.
● When node n is at the top of the control stack, the stack contains the
nodes along the path from n to the root

Example: Control stack contains nodes along a path to the root

25
SRMIST, Vadapalani Campus
● It is used to keep track of live procedure activation
● Idea
push the node for an activation onto the control stack - when the activation begins
pop the node for an activation - when the activation ends.

The control stack contains the following nodes along this path to the root: m, qs(1,10), qs(1,4), qs(4,4)
26
SRMIST, Vadapalani Campus
4. The scope of the declaration
● A declaration in a language is a syntactic construct that associates
information with a name.
var i : integer;
● Declarations may be explicit, or may be implicit
● The scope rules of a language determine which declaration of a name
applies when the name appears in the text of a program
● The portion of the program to which a declaration applies is called the
scope of that declaration
● When a declaration is seen a symbol table entry is created for it.

27
SRMIST, Vadapalani Campus
5. Binding of names
● If each name is declared once in a program, the same name may denote different
data objects at run time.
● "data object" - storage location that can hold values.
● Environment refers to a function that maps a name to a storage location
● Sate refers to a function that maps a storage location to the value held there
● Environment maps a name to an r-value, and a state maps the /-value to an r-value.
● When an environment associates storage location s with a name x. i.e., x is bound to s

28
SRMIST, Vadapalani Campus
STORAGE ORGANIZATION
The organization of runtime storage is as follows:
1. Subdivision of run-time memory
● The compiler obtains a block of storage from the operating system for the
compiled program to run in
● This runtime storage is subdivided to hold
○ The data generated target
○ Data object
○ Counter part of the control stack
● The size of the generated target code is fixed at compile time,
● The size of some of the data objects may also be known at compile time
● Stack is used to store data structures called activation records that get generated
during procedure call
● Each time a procedure is called space for its local variable is pushed onto a stack
and when the procedure terminates that space is popped off the stack
29
SRMIST, Vadapalani Campus
● A separate area of run-time memory, called a heap, holds all other information
● The sizes of the stack and the heap can change as the program executes,
dynamically, at the opposite ends of memory
● Stack grows towards higher address (“downward-growing”)

Issue
● Layout and allocation of data to memory location
in runtime is an issue to be addressed

30
SRMIST, Vadapalani Campus
2. Activation Records
● The execution of procedure is called activation
● Information needed by a single execution of a
procedure is managed using a contiguous block of
storage called an activation record or frame
● The activation record of a procedure is pushed on
the run-time stack when a procedure is called and
pop off activation record when control returns to
the caller.
● The sizes of each of these fields can be
determined at the time a procedure is called
● The sizes of almost all fields can be determined at
compile time.

31
SRMIST, Vadapalani Campus
The purpose of the fields of an activation record is as follows:
● Temporary values🡪 arising in the evaluation of expressions, are stored in the field
for temporaries.
● Local data 🡪holds data that is local to an execution of a procedure.
● Saved machine status🡪 holds information about the state of the machine just
before the procedure is called. This information includes the values of the program
counter and machine registers that have to be restored when control returns from
the procedure.
● Optional access link 🡪refer to nonlocal data held in other activation records.
● Optional control link 🡪points to the activation record of the caller
● Actual parameters 🡪 used by the calling procedure to supply parameters to the
called procedure.
● Returned value 🡪 used by the called procedure to return a value to the calling
procedure

32
SRMIST, Vadapalani Campus
3. Compile-Time Layout of local data
● The amount of storage needed for a name is determined from its type
● An elementary data type, such as a character, integer, or real, can usually be
stored in an integer number of bytes.
● Storage for an aggregate, such as an array or record, must be large enough to hold
all its components

The relative address, or offset, is


the difference between the
addresses of the position and the
data object.

Space left unused due to


alignment considerations is
referred to as padding

33
SRMIST, Vadapalani Campus
STORAGE ALLOCATION STRATEGIES

34
SRMIST, Vadapalani Campus
35
SRMIST, Vadapalani Campus
Static🡪Example

36
SRMIST, Vadapalani Campus
37
SRMIST, Vadapalani Campus
Stack🡪Example

38
SRMIST, Vadapalani Campus
39
SRMIST, Vadapalani Campus
Variable length data: on stack

40
SRMIST, Vadapalani Campus
41
SRMIST, Vadapalani Campus
Heap🡪Example

42
SRMIST, Vadapalani Campus
PARAMETER PASSING
● When one procedure calls another, the usual method of communication
between them is through nonlocal names and through parameters of the
called procedure
Methods
● Call by value
● Call by reference
● Copy restore
● Call by name or
Here the array a is nonlocal to the procedure
● Macro expansion or inline code exchange, and i and j are parameters

43
SRMIST, Vadapalani Campus
General terms

The term l-value (l-left) refers to the storage represented by expression


The term r-value (r-right) refers to the value contained in the storage

44
SRMIST, Vadapalani Campus
1. Call by value

45
SRMIST, Vadapalani Campus
2. Call by reference

46
SRMIST, Vadapalani Campus
3. Copy restore

47
SRMIST, Vadapalani Campus
4. Call by name

48
SRMIST, Vadapalani Campus

You might also like