You are on page 1of 40

Introduction to Data Flow

Analysis

1
Data Flow Analysis
• Construct representations for the structure
of flow-of-data of programs based on the
structure of flow-of-control of programs
• Collect information about the attributes of
data at various program points according to
the structure of flow-of-data of programs

2
Points
• Within each basic block, a point is assigned
between two adjacent statements, before the
first statement, and after the last statement

3
An Example
d1: i = m - 1
d2: j = n B1
d3: a = u1

d4: i = i + 1 B2

d5: j = j - 1 B3

B4

d6: a = u2 B5 B6
4
Paths
• A path from p1 to pn is a sequence of points
p1, p2, …, pn such that for each i, 1  i  n-1,
either
• pi is the point immediately preceding a state
ment and pi+1 is the point immediately follo
wing that statement in the same block, or
• pi is the end of some block and pi+1 is the be
ginning of a successor block
5
An Example
d1: i = m - 1
d2: j = n B1
d3: a = u1

d4: i = i + 1 B2

d5: j = j - 1 B3

if e3 B4

d6: a = u2 B5 d7: i = u3 B6
6
Reaching Definitions
• A definition of a variable x is a statement
that assigns or may assign a value to x
• A definition d of some variable x reaches a
point p if there is a path from the point
immediately following d to p such that no
unambiguous definition of x appear on that
path

7
An Example
d1: i = m - 1;
d2: j = n;
d3: a = u1;
do
d4: i = i + 1;
d5: j = j - 1;
if e1 then
d6: a = u2
else
d7: i = u3
while e2 8
Ambiguity of Definitions
• Unambiguous definitions (must assign values)
– assignments to a variable
– statements that read a value to a variable
• Ambiguous definitions (may assign values)
– procedure calls that have call-by-reference paramet
ers
– procedure calls that may access nonlocal variables
– assignments via pointers
9
Safe or Conservative Information
• Consider all execution paths of the control
flow graph
• Allow definitions to pass through
ambiguous definitions of the same variables
• The computed set of reaching definitions is
a superset of the exact set of reaching
definitions

10
Information for Reaching
Definitions
• gen[S]: definitions generated within S and r
eaching the end of S
• kill[S]: definitions killed within S
• in[S]: definitions reaching the beginning of
S
• out[S]: definitions reaching the end of S

11
Data Flow Equations
• Data flow information can be collected by s
etting up and solving systems of equations t
hat relate information at various points

out[S] = gen[S]  (in[S] - kill[S])

The information at the end of a statement is


either generated within the statement or ente
rs at the beginning and is not killed as contr
ol flows through the statement 12
The Iterative Algorithm
• Repeatedly compute in and out sets for each
node in the control flow graph
simultaneously until there is no change

in[B] = p  pred(B) out[P]

out[B] = gen[B]  (in[B] - kill[B])

13
Algorithm: Reaching Definitions
/* Assume in[B] =  for all B */
for each block B do out[B] := gen[B]
change := true;
while change do begin
change := false;
for each block B do begin
in[B] := p  pred(B) out[p]
oldout := out[B]
out[B] := gen[B]  (in[B] - kill[B])
if out[B]  oldout then change := true
end
end 14
An Example
d1: i = m - 1 111 0000
d2: j = n B1
000 1111
d3: a = u1

d4: i = i + 1
B2 000 1100
d5: j = j - 1 110 0001

d6: a = u2 d7: i = u3
000 0010 000 0001
B3 B4
001 0000 100 1000
15
An Example

Initial Pass 1 Pass 2


Block
In[B] Out[B] In[B] Out[B] In[B] Out[B]
B1 000 0000 111 0000 000 0000 111 0000 000 0000 111 0000

B2 000 0000 000 1100 111 0011 001 1110 111 1111 001 1110

B3 000 0000 000 0010 001 1110 000 1110 001 1110 000 1110

B4 000 0000 000 0001 001 1110 001 0111 001 1110 001 0111

16
Conservative Computation
• The computed gen set of reaching definition
s is a superset of the exact gen set of reachi
ng definitions
• The computed kill set of reaching definition
s is a subset of the exact kill set of reaching
definitions
• The computed in and out sets of reaching de
finitions is a superset of the exact in and out
sets of reaching definitions 17
Local Data Flow Information
• The gen and kill sets for a basic block is obt
ained from the gen and kill sets for the state
ments in the basic block
• Only the in and out sets for the basic blocks
are computed in the global data flow analys
is
• The in and out sets for the statements in a b
asic block can be computed locally from the
in set for the basic block if necessary
18
UD-Chains and DU-Chains
• A variable is used at statement s if its r-valu
e may be required
• The reaching definitions information is ofte
n stored as use-definition chains (or ud-chai
ns)
• The ud-chain for a use u of a variable x is th
e list of all the definitions of x that reach u
• The definition-use chains (or du-chains) for
a definition d of a variable x is the list of all
the uses of x that use the value defined at d
19
A Taxonomy of Data Flow
Problems

Forward-Flow Backward-Flow

Any in[B] = p  pred(B) out[p] out[B] = s  succ(B) in[s]

path out[B] = gen[B]  (in[B] - kill[B]) in[B] = gen[B]  (out[B] - kill[B])

All in[B] = p  pred(B) out[p] out[B] = s  succ(B) in[s]

path out[B] = gen[B]  (in[B] - kill[B]) in[B] = gen[B]  (out[B] - kill[B])

20
Available Expressions
• An expression x+y is available at a point p
if every path from the initial node to p
evaluates x+y, and after the last such
evaluation prior to reaching p, there are no
subsequent assignments to x or y

21
An Example

t1 = 4 * i t1 = 4 * i

i=…
?
t0 = 4 * i

t2 = 4 * i t2 = 4 * i

22
The gen and kill Sets
• A block kills expression x+y if it possibly
assign x or y and does not subsequently
reevaluate x+y
• A block generates expression x+y if it
definitely evaluates x+y and does not
subsequently redefine x or y

23
The gen Set for a Block
• No expressions are available at the beginning
• Assume set A of expressions is available
before statement x = y+z. The set of
expressions available after the statement is
formed by
– adding to A the expression y+z
– deleting from A any expression involving x
• At the end, A is the set of generated
expressions 24
The kill Set for a Block
• All expressions y+z such that either y or z is
defined and y+z is not generated by the
block

25
An Example
Statements Available Expressions

……………….…………none
a=b+c
……………….…………only b + c
b=a-d
……………….…………only a - d
c=b+c
…………….……………only a - d
d=a-d
…………….……………none 26
The in and out Sets

in[B] = , for B = initial


in[B] = p  pred(B) out[p], for B  initial
out[B] = gen[B]  (in[B] - kill[B])

27
Initialization of the in Sets

B1
Oj+1 = G  (Ij - K)
Ij+1 = out[B1]  Oj+1
B2

I0 =  I0 = U
O1 = G O1 = U - K
I1 = out[B1]  G I1 = out[B1] - K
O2 = G O2 = G  (out[B1] - K)
28
Algorithm: Available Expressions
/* Assume in[B1] =  and in[B] = U for all B  B1 */
in[B1] = ;
out[B1] = gen[B1];
for each block B  B1 do out[B] := U - kill[B];
change := true;
while change do begin
change := false;
for each block B  B1 do begin
in[B] := p  pred(B) out[p]
oldout := out[B]
out[B] := gen[B]  (in[B] - kill[B])
if out[B]  oldout then change := true
end 29

end
Conservative Computation
• The computed gen set of available expressi
ons is a subset of the exact gen set of availa
ble expressions
• The computed kill set of available expressio
ns is a superset of the exact kill set of availa
ble expressions
• The computed in and out sets of available e
xpressions is a subset of the exact in and out
sets of available expressions
30
Live Variables
• A variable x is live at a point p if the value
of x at p could be used along some path in
the control flow graph starting at p;
otherwise, x is dead at p

31
The def and use Sets
• def[B]: the set of variables definitely
assigned values in B
• use[B]: the set of variables whose values
are possibly used in B prior to any
definition of the variable

32
The in and out Sets

out[B] = s  succ(B) in[s]

in[B] = use[B]  (out[B] - def[B])

33
Algorithm: Live Variables
/* Assume in[B] =  for all B */
for each block B do in[B] := 
change := true;
while change do begin
change := false;
for each block B do begin
out[B] := s  succ(B) in[s]
oldin := in[B]
in[B] := use[B]  (out[B] - def[B])
if in[B]  oldin then change := true
end
end 34
Conservative Computation
• The computed use set of live variables is a
superset of the exact use set of live
variables
• The computed def set of live variables is a
subset of the exact def set of live variables
• The computed in and out sets of live
variables is a superset of the exact in and
out sets of live variables
35
Busy Expressions
• An expression is busy at a point p if along
all paths from p to the final node its value is
used before the expression is killed

36
The use and kill Sets
• use[B]: the set of expressions that are used
before they are killed in B
• kill[B]: the set of expressions that are killed
before they are used in B

37
The in and out Sets

out[B] = , for B = final


out[B] = s  succ(B) in[s], for B  final
in[B] = use[B]  (out[B] - kill[B])

38
Algorithm: Busy Expressions
/* Assume out[Bn] =  and out[B] = U for all B  Bn */
out[Bn] = ;
in[Bn] = use[Bn];
for each block B  Bn do in[B] := U - kill[B];
change := true;
while change do begin
change := false;
for each block B  Bn do begin
out[B] := s  succ(B) in[s]
oldin := in[B]
in[B] := use[B]  (out[B] - kill[B])
if in[B]  oldin then change := true
end 39

end
Conservative Computation
• The computed use set of busy expressions is
a subset of the exact use set of busy
expressions
• The computed kill set of busy expressions is
a superset of the exact kill set of busy
expressions
• The computed in and out sets of busy
expressions is a subset of the exact in and
out sets of busy expressions 40

You might also like