Unit-5-2

18CSC304J- COMPILER DESIGN
UNIT-5
SRMIST, Vadapalani Campus

UNIT-V SYLLABUS
1. Code optimization- Introduction 10. Introduction to Global Data Flow Analysis
2. Principal Sources of Optimization 11. Computation of gen and kill
3. Function Preserving Transformation 12. Computation of in and out
4. Loop Optimization 13. Parameter Passing
5. Optimization of basic Blocks 14. Runtime Environments
6. Building Expression of DAG 15. Source Language issues
7. Peephole Optimization 16. Storage Organization
8. Basic Blocks, Flow Graphs 17. Activation Records
9. Next -Use Information 18. Storage Allocation strategies
2
INTRODUCTION TO GLOBAL DATA
FLOW ANALYSIS
● In order to do code optimization and a good job of code generation ,
compiler needs to collect information about the program as a whole
and to distribute this information to each block in the flow graph.
● A compiler could take advantage of “reaching definitions” , such as knowing
where a variable like debug was last defined before reaching a given block,
in order to perform transformations are just a few examples of data-flow
information that an optimizing compiler collects by a process known as
data-flow analysis.
● Data-flow information can be collected by setting up and solving systems of
equations of the form :
3
● This equation can be read as “ the information at the end of a statement is
either generated within the statement , or enters at the beginning and is not
killed as control flows through the statement.” such equations are called
data-flow equations
● The details of how data-flow equations are set and solved depend on 3 factors.
○ The notions of generating and killing depend on the desired information, i.e.,
for some problems, instead of proceeding along with flow of control and
defining out[s] in terms of in[s], we need to proceed backwards and define in[s]
in terms of out[s].
○ Since data flows along control paths, data-flow analysis is affected by the
constructs in a program. i.e., write out[s], unique end point where control
leaves the statement;
○ There are subtleties that go along with such statements as procedure calls,
assignments through pointer variables, and even assignments to array
variables
4
5
Points and Paths
● Within a basic block, we talk the point between two adjacent statements, as well as
the point before the first statement and after the last.
● Thus, block B1 has four points: one before any of the assignments and one after each
of the three assignments.
Example: There is a path from the beginning of block B5 to the

beginning of block B6. It travels through the end point of B5 and
then through all the points in B2, B3, and B4, in order, before it
reaches the beginning of B6
6
Reaching Definition
● A definition of variable x is a statement that assigns, or may assign, a value to x.
● Most common forms of definition are assignments to x and statements that read a
value from an i/o device and store it in x.
● These statements certainly define a value for x, and they are referred to as
unambiguous definitions of x.
● There are certain kinds of statements that may define a value for x; they are called
ambiguous definitions. The most usual forms are
○ A call of a procedure with x as a parameter or a procedure that can access x
because x is in the scope of the procedure.
○ An assignment through a pointer that could refer to x, the assignment *q: = y is
a definition of x if it is possible that q points to x
● A definition d reaches a point p if there is a path immediately following d to p, such
that d is not “killed” along that path.
● a killed 🡪 if two points along path then there is a definition of d
7
8
Data-flow analysis of structured programs:
● Flow graphs for control flow constructs such as do-while statements have a useful
property:
“there is a single beginning point at which control enters and a single end point that
control leaves from when execution of the statement is over”.
● Expressions in this language are similar to intermediate code, but the flow graphs for
statements have restricted forms.
● We define a portion of a flow graph called a region to be a set of nodes N that
includes a header, which dominates all other nodes in the region.
● All edges between nodes N are in the region, except for some that enter the
header.
● The portion of flow graph corresponding to a statement S is a region that obeys the
further restriction that control flow to just one outside block when it leaves the
9
region. SRMIST, Vadapalani Campus
We say that the beginning points of the dummy
blocks at the entry and exit of a statement’s
region are the beginning and end points,
respectively, of the statement.
The equations are inductive, or syntax-directed,

definition of the sets in[S], out[S], gen[S],
and kill[S] for all statements S.
gen[S] and kill[S] are synthesized attributes
gen[S] is the set of definitions “generated” by S
while kill[S] is the set of definitions that never

reach the end of S.
10
Computation of gen, kill, in, out
Example:
Consider the following
data-flow equations for
reaching definitions :
Fig:
data-flow
equations
for
reaching
definitions
11
Conservative estimation of data-flow information:
● There is a subtle miscalculation in the rules for gen and kill.
● We have made the assumption that the conditional expression E in the if
and do statements are “uninterpreted”; that is, there exists inputs to the
program that make their branches go either way.
● We assume that any graph-theoretic path in the flow graph is also an
execution path, i.e., a path that is executed when the program is run with
least one possible input.
● When we compare the computed gen with the “true” gen we discover that
the true gen is always a subset of the computed gen. on the other hand, the
true kill is always a superset of the computed kill.
12
Computation of in and out:
● Assuming we know in[S] we compute out by equation, that is
Out[S] = gen[S] U (in[S] - kill[S])
● Considering cascade of two statements S1; S2, as in the second case. We
start by observing in[S1]=in[S]. Then, we recursively compute out[S1], which
gives us in[S2], since a definition reaches the beginning of S2 if and only if it
reaches the end of S1. Now we can compute out[S2], and this set is equal to
out[S].
● Considering if-statement we have conservatively assumed that control can
follow either branch, a definition reaches the beginning of S1 or S2 exactly
when it reaches the beginning of S.
In[S1] = in[S2] = in[S]
● If a definition reaches the end of S if and only if it reaches the end of one or
both sub statements; i.e,
Out[S]=out[S1] U out[S2] 13
14
15
Problems of data flow analysis
1. Reaching definition
● The definition that may reach a program point along some path are know as
reaching definition
● i.e., A definition d reaches a point p if there is a path from the point immediately
following d to p, such that d is not “killed” along that path
2. Live variable analysis
● To know for variable x and point p whether the value of x at p could be used along
some path in the flow graph starting at p
If so we say x is live at p
otherwise
x is dead at p
3. Available expression
● An expression x+y is available at a point p if every path from the entry node at p
evaluates x+y prior to reaching p there are no subsequent assignment to x or y
16
RUNTIME ENVIRONMENTS
● Runtime means a program in execution
● As execution proceeds, the same name in the source text can denote
different data objects in the target machine
● Data objects at run time is determined by its type
● Program needs memory resources to execute instructions
● It takes care of memory allocation and de-allocation
● The design of runtime support package is influenced by the semantics of
procedure (functions)
● Each execution of a procedure is referred to as an activation of the
procedure.
● If the procedure is recursive, several of its activations may be alive at the
same time.
17
SOURCE LANGUAGE ISSUES
The source language distinguishes between the source text of a procedure and its
activations at run time
1. Procedures
● A program is made up of procedure
● A Procedure is a declaration that associates an identifier with a statement
Identifier is a Procedure name and
Statement is a Procedure body
● Procedure call-Procedure name appears within an executable statement
● Identifiers appearing in a procedure definition are special, and are called formal
parameters. (C calls them "formal arguments" and Fortran calls them "dummy
arguments, ")
● Arguments, known as actual parameters may be passed to a called procedure; they
are substituted for the formals in the body
18
19
Example-2:
Pascal
program for
reading and
sorting
integers
(quicksort)
20
2. Activation Trees
● Assumptions among procedures during the execution of a program:
1. Control flows sequentially;
2. Each execution of a procedure starts at the beginning of the procedure body
and eventually returns control to the point immediately following the place
where the procedure was called.
● Each execution of a procedure body Is referred to as an activation of the
procedure
● We can represent the activation of procedures during the running of an
entire program by a tree called activation tree
● Each node corresponds to one activation and the root is the activation of
the main procedure that initiates execution of the program
● The activations are called from left to right
● One child must finish before the activation to its right can begin
21
● Activation tree shows the way control enters and leaves activations of the
procedures
● If a and b are two procedures their activation will be
Non-overlapping when one procedure is called after other
Nested – nested procedure
● In recursive procedure – new activation begins before an earlier activation
of the same procedure has ended
● In a activation tree
○ Each node represents an activation of a procedure
○ The root represents the activation of the main program
○ The node for ‘a’ is the parent of node for ‘b’ if and only if control flows from
activation ‘a’ to ‘b’
○ The node for ‘a’ is to the left of node ‘b’ if and only if the lifetime of ‘a’ occurs
before lifetime of ‘b’
22
Example-1: Activation tree of quick sort using C
23
Example-2: Activation tree of quick sort using Pascal
Note: For exam draw any one activation tree
24
3. Control stack
● The flow of control in a program corresponds to a depth-first traversal of
the activation tree that starts at the root, visits a node before its children,
and recursively visits children at each node in a left-to-right order.
● Whenever a procedure is executed, its activation record is stored on the
stack also known as control stack
● The contents of the control stack are related to paths to the root of the
activation tree.
● When node n is at the top of the control stack, the stack contains the
nodes along the path from n to the root
Example: Control stack contains nodes along a path to the root
25
● It is used to keep track of live procedure activation
● Idea
push the node for an activation onto the control stack - when the activation begins
pop the node for an activation - when the activation ends.
The control stack contains the following nodes along this path to the root: m, qs(1,10), qs(1,4), qs(4,4)
26
4. The scope of the declaration
● A declaration in a language is a syntactic construct that associates
information with a name.
var i : integer;
● Declarations may be explicit, or may be implicit
● The scope rules of a language determine which declaration of a name
applies when the name appears in the text of a program
● The portion of the program to which a declaration applies is called the
scope of that declaration
● When a declaration is seen a symbol table entry is created for it.
27
5. Binding of names
● If each name is declared once in a program, the same name may denote different
data objects at run time.
● "data object" - storage location that can hold values.
● Environment refers to a function that maps a name to a storage location
● Sate refers to a function that maps a storage location to the value held there
● Environment maps a name to an r-value, and a state maps the /-value to an r-value.
● When an environment associates storage location s with a name x. i.e., x is bound to s
28
STORAGE ORGANIZATION
The organization of runtime storage is as follows:
1. Subdivision of run-time memory
● The compiler obtains a block of storage from the operating system for the
compiled program to run in
● This runtime storage is subdivided to hold
○ The data generated target
○ Data object
○ Counter part of the control stack
● The size of the generated target code is fixed at compile time,
● The size of some of the data objects may also be known at compile time
● Stack is used to store data structures called activation records that get generated
during procedure call
● Each time a procedure is called space for its local variable is pushed onto a stack
and when the procedure terminates that space is popped off the stack
29
● A separate area of run-time memory, called a heap, holds all other information
● The sizes of the stack and the heap can change as the program executes,
dynamically, at the opposite ends of memory
● Stack grows towards higher address (“downward-growing”)
Issue
● Layout and allocation of data to memory location
in runtime is an issue to be addressed
30
2. Activation Records
● The execution of procedure is called activation
● Information needed by a single execution of a
procedure is managed using a contiguous block of
storage called an activation record or frame
● The activation record of a procedure is pushed on
the run-time stack when a procedure is called and
pop off activation record when control returns to
the caller.
● The sizes of each of these fields can be
determined at the time a procedure is called
● The sizes of almost all fields can be determined at
compile time.
31
The purpose of the fields of an activation record is as follows:
● Temporary values🡪 arising in the evaluation of expressions, are stored in the field
for temporaries.
● Local data 🡪holds data that is local to an execution of a procedure.
● Saved machine status🡪 holds information about the state of the machine just
before the procedure is called. This information includes the values of the program
counter and machine registers that have to be restored when control returns from
the procedure.
● Optional access link 🡪refer to nonlocal data held in other activation records.
● Optional control link 🡪points to the activation record of the caller
● Actual parameters 🡪 used by the calling procedure to supply parameters to the
called procedure.
● Returned value 🡪 used by the called procedure to return a value to the calling
procedure
32
3. Compile-Time Layout of local data
● The amount of storage needed for a name is determined from its type
● An elementary data type, such as a character, integer, or real, can usually be
stored in an integer number of bytes.
● Storage for an aggregate, such as an array or record, must be large enough to hold
all its components
The relative address, or offset, is

the difference between the
addresses of the position and the
data object.
Space left unused due to

alignment considerations is
referred to as padding
33
STORAGE ALLOCATION STRATEGIES
34
35
Static🡪Example
36
37
Stack🡪Example
38
39
Variable length data: on stack
40
41
Heap🡪Example
42
PARAMETER PASSING
● When one procedure calls another, the usual method of communication
between them is through nonlocal names and through parameters of the
called procedure
Methods
● Call by value
● Call by reference
● Copy restore
● Call by name or
Here the array a is nonlocal to the procedure
● Macro expansion or inline code exchange, and i and j are parameters
43
General terms
The term l-value (l-left) refers to the storage represented by expression

The term r-value (r-right) refers to the value contained in the storage
44
1. Call by value
45
2. Call by reference
46
3. Copy restore
47
4. Call by name
48

Unit-5-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-5-2

Uploaded by

Copyright:

Available Formats

18CSC304J- COMPILER DESIGN

SRMIST, Vadapalani Campus

Example: There is a path from the beginning of block B5 to the

The equations are inductive, or syntax-directed,

gen[S] and kill[S] are synthesized attributes

gen[S] is the set of definitions “generated” by S

while kill[S] is the set of definitions that never

Note: For exam draw any one activation tree

Example: Control stack contains nodes along a path to the root

The relative address, or offset, is

Space left unused due to

The term l-value (l-left) refers to the storage represented by expression

You might also like