1: code;2:
do
{
if
(
condition
)
{
3: code;
}
else
{
4: code;
}
5:
}
while
(
condition
);
6: code;
6123 45
Figure 1.
A sample program and its corresponding control flowgraph. The starting point of each basic block is indicated through acorresponding label in the program code.The set of
immediate predecessors
of
b
j
(the set of basic blocksthat can branch to
b
j
) is characterized through the inverse of thesuccessor function:
Γ
−
1
G
(
b
j
) =
{
b
i
|
(
b
i
,b
j
)
∈ E}
. It can be emptyonly for the
entry node
e
∈ B
, which is the first block executedwhen running
P
:
Γ
−
1
G
(
e
) =
∅
.A
path
P
along edges of a graph
G
can be expressed a se-quence of nodes
(
b
1
,b
2
,...,b
n
)
of
G
. Each node in the sequenceis an immediate successor of the predecessor node in the sequence:
b
i
+1
∈
Γ
1
G
(
b
i
)
. A path does not have to consist of distinct blocksand can contain the same blocks (and implicitly the same edges)repeatedly.Figure1showsasampleprogramconsistingofan
if-then-else
condition inside a
do/while
loop and the corresponding con-trol flow graph with
B
=
{
1
,
2
,
3
,
4
,
5
,
6
}
and edges
E
=
{
(1
,
2)
,
(2
,
3)
,
(2
,
4)
,
(3
,
5)
,
(4
,
5)
,
(5
,
6)
}
. In this example, both
(1
,
2
,
3)
and
(1
,
2
,
4)
are valid paths, and so is
(1
,
2
,
3
,
5
,
2
,
3)
since paths are allowed to contain the same node multiple times.
(1
,
2
,
5)
on the other hand is not a valid path, because
(2
,
5)
/
∈ E
.A
cycle
C
in a control flow graph is a path
(
b
1
,b
2
,...,b
n
)
where
b
1
=
b
n
. In Figure 1,
(2
,
3
,
5
,
2)
and
(2
,
4
,
5
,
2)
are bothvalid cycles, and so is
(2
,
3
,
5
,
2
,
3
,
5
,
2)
. Cycles in a control flowgraph correspond to loops in the original program.
(2
,
3
,
5
,
2)
and
(2
,
4
,
5
,
2)
are in fact two different cycles through the same loopand the
if-then-else
construct in it.For the purpose of this paper, we will assume that no uncon-nected basic blocks exist in control flow graphs. This means thatfor every node
b
i
of a graph
G
there exists a valid path
P
of theform
(
e,...,b
i
)
that leads from the entry node
e
to
b
i
.
3. Representing Partial Programs as Trace Trees
A
trace tree
TT
= (
N
,
P
)
is a directed graph representinga set of related cycles (
traces
) in the program, where
N
is aset of nodes (
instructions
), and
P
is a set of directed edges
{
(
n
i
,n
j
)
,
(
n
k
,n
l
)
,...
}
between them. Each node
n
∈ N
is la-beled with an
operation
from the set of valid operations definedby the (virtual) machine language we are compiling. For the pur-pose of this paper it is sufficient to consider two operations:
bc
and
op
, with
bc
representing conditional branch operations (for ex-ample
ifeq
or
lookupswitch
in case of JVML [20]), and
op
representing all other non branching operations.Each directed edge
(
n
i
,n
j
)
in
P
indicates that instruction
n
j
is executed immediately before instruction
n
i
executes. Thus, wealso refer to an edge
(
n
i
,n
j
)
∈ P
as
predecessor edge
.Similarily to the control flow graph, we define a successor func-tion
Γ
1
TT
(
n
j
) =
{
n
i
|
(
n
i
,n
j
)
∈ P}
, which returns all instructionsthat have a predecessor edge pointing to
n
j
, and thus can execute
7123 45
opop opbcbcbcbc
6
Figure 2.
A sample trace tree. The tree shown in this example isnot related to the control-flow graph in Figure 1.immediately
after
instruction
n
j
. Instructions labeled with
op
haveat most one successor instruction:
∀
n
∈ N ∧
label
(
n
) =
op
:
|
Γ
1
TT
(
n
)
| ≤
1
. These correspond to non-branching instructionsin the virtual machine language (such as
add
or
mul
). Instructionslabeled with
bc
can have an arbitrary number of successor instruc-tions, including no successors at all. This is the case because edgesare added to the trace tree lazily as it is built. If only one edge isever observed for a conditional branch, all other edges never appearin the trace tree.A node can have no successors if the control flow is observed toalways return to the anchor node
a
(defined below), since this back-edge is considered implicit and does not appear in the predecessoredge set
P
. Such instructions that have an implicit back-edge to
a
are called
leaf
nodes, and we denote the set of leaf nodes of atrace tree
TT
as
leaf set
L
. Leaf nodes can be branching (
bc
) ornon-branching (
op
). A non-branching leaf node implies an uncon-ditional jump back to the anchor node following the leaf node. Abranching leaf node corresponds to a conditional branch back to theanchor node.The predecessor function
Γ
−
1
TT
(
n
i
) =
{
n
j
|
(
n
i
,n
j
)
∈ P}
returns the set of instructions an instruction
n
i
has predecessoredges pointing to. There is exactly one node
a
such that
Γ
−
1
TT
(
a
) =
∅
, and we called it the
anchor node
. All other nodes have exactlyone predecessor node, and thus the predecessor function returns aset containing exactly one node:
∀
n
∈ N ∧
n
=
a
:
|
Γ
−
1
T
(
n
)
|
= 1
.This gives the directed graph the structure of a
directed rooted tree
,with the anchor node
a
as its root.Each leaf node of the trace tree represents the last node in acycle of instructions that started at the anchor node and ends at theanchor node, and we call each of these cycles a
trace
. The anchornode is shared between all traces, and the traces split up in a tree-like structure from there.Figure 2 shows a visual representation of an example trace treeformed by set of nodes
B
=
{
1
,
2
,
3
,
4
,
5
,
6
,
7
}
and the set of pre-decessor edges
P
=
{
(2
,
1)
,
(3
,
2)
,
(4
,
2)
,
(5
,
3)
,
(6
,
4)
,
(7
,
4)
}
.Only solid edges are true predecessor edges. Dashed edges de-note implicit back-edges, and are not part of the trace tree. Theanchor node of the trace tree is
1
and is shown in bold. Leaf nodes
L
=
{
4
,
5
,
6
,
7
}
are drawn as circles. Each instruction is labeledwith its operation (
bc
or
op
).
3.1 Constructing a Trace Tree
In contrast to control flow graphs, our IR does not require completeknowledge of all nodes (basic blocks) and directed edges between
2
2006/12/23
Leave a Comment