Professional Documents
Culture Documents
net/publication/220854681
CITATIONS READS
35 829
4 authors, including:
Adam Porter
University of Maryland, College Park
141 PUBLICATIONS 5,990 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Adam Porter on 23 May 2014.
63
A
be obtained, at least in part, from the component providers.
Dependency relationships between components are then en-
* coded as a directed acyclic graph called Component De-
pendency Graph (CDG) and constraints between compo-
+ D nents are described as Annotations of the graph; together
Version Annotations they form the model called Annotated Component Depen-
B C dency Model (ACDM). Relationships between components
Component Versions
A A1 may also be encoded using other formalisms such as feature-
* * B B1, B2, B3
C C1, C2
based [4] or rule-based models [13]. The example CDG de-
E F D D1, D2, D3 picted in Figure 1 shows dependencies for a SUT component
E E1, E2, E3, E4 called A. The figure shows that A requires component D and
F F1, F2, F3, F4, F5
* G G1 one of B or C (captured via an XOR node represented by +).
Constraints
Components B and C require E; D requires F; and E and F
G (ver(C) = C2) → (ver(E) ≥ E3) require G. Additional information such as version identifiers
for components is also specified as annotations to the CDG.
Figure 1: An Example ACDM We formally describe our model in Section 3.
2. Determine coverage criteria: The above model en-
middleware library [12] that incompatibilities between com-
codes the configuration space for the SUT. For non-trivial
ponents can be effectively detected by Rachet. In this paper,
software, this space can be quite large. Developers must de-
we extend our previous work and describe in detail new algo-
termine which part of the space need to be tested for com-
rithms and strategies to improve the efficiency and effective-
patibility testing. For example, they may decide to test con-
ness of compatibility testing. To validate our approach, we
figurations exhaustively, which is often infeasible. Instead,
have performed experiments and simulations on two widely-
they may choose more practical criteria that systematically
used middle libraries from the field of high-performance com-
cover the space. In this paper, we propose a coverage cri-
puting. For this paper, we restrict the compatibility testing
terion that tests all combinations of each component with
to the build process (i.e. compilation and deployment) of
other components on which it directly depends; the defini-
a component with other components on which it depends,
tion and rationale behind this criterion is further explained
but we will extend the work to functional and performance
in Section 3.
testing in future work.
3. Produce test configurations and test plan: Given
Our approach makes several contributions, including: (1)
the model and coverage criteria, Rachet produces test con-
a formalism for modeling software system’s potential config-
figurations automatically; each configuration describes a set
uration space precisely; (2) algorithms for producing exhaus-
of components to build and dependency information used
tive and sampled test configurations to detect incompatibil-
to build the components. Then, a test plan is synthesized
ities between components; (3) strategies to test configura-
from the configurations, which specifies the schedule to build
tions systematically in parallel, dealing with test failures dy-
components in the configurations.
namically to allow greater coverage; (4) an implementation
4. Execute test plan: Rachet chooses tasks (sub configu-
that realizes the Rachet process, utilizing multiple compu-
rations) in the plan and distributes them to multiple execu-
tational resources that employ platform virtualization to en-
tion nodes according to a plan execution strategy described
able the results from building lower-level components to be
by the developers. If the goal is to determine whether a
reused in building multiple higher-level components; and (5)
component can be built without any error on top of other
results from empirical studies showing that the Rachet ap-
components on which it depends, then a set of instructions
proach enables fast and effective compatibility testing across
to build each component should be specified. In addition,
a large configuration space.
Rachet applies contingency plan to handle task failures dy-
The rest of the paper is organized as follows. We explain a
namically. If testing a component in a task fails, it may
high-level overview of the steps needed to perform compati-
prevent testing other components in the plan depending on
bility testing in Section 2. In Section 3 we describe a model
it. In this case, Rachet dynamically modifies test plan so as
to represent the software configuration space. Our approach
not to lose test coverage, by creating additional configura-
to generate effective test configurations is presented in Sec-
tions that try to build the components in different ways.
tion 4. Section 5 presents strategies to test the configura-
tions, utilizing a large set of test resources. Section 6 outlines
the Rachet architecture and presents results from empirical 3. CONFIGURATION SPACE MODEL
studies. Section 7 describes related work and in Section 8 Components, their versions, inter-component dependen-
we conclude with a brief discussion and future work. cies, and constraints define the configuration space of an
SUT. The ACDM models the configuration space with two
components, a CDG and a set of annotations (Ann). A
2. Rachet PROCESS OVERVIEW CDG has two types of nodes – component nodes and re-
This section provides a high-level overview of the steps lation nodes; directed edges represent architectural depen-
needed to perform compatibility testing for a given software dencies between them. For example, Figure 1 depicts an
under test (SUT) using Rachet. SUT A that requires components B and D or C and D, each
of which depend in turn on other components. As shown
1. Model software configuration space: To define the in the figure, inter-component dependencies are captured by
configuration space, i.e., the ways in which the SUT may relation nodes that are labeled either “∗” or “+”, which are
be legitimately configured, developers identify the compo- interpreted respectively as applying a logical AND or XOR
nents needed to build the SUT. This information can often over the relation node’s outgoing edges.
64
ACDM annotations provide additional information about Definition 4. In a CDG, a component c directly depends
components in the CDG. The first set of annotations for on a set of components if there exists a path that does not
this example is an ordered list of version identifiers for each contain any other component node, from the component node
component. Each identifier represents a unique version of for c to each component node for the components in the set.
the corresponding component. In Figure 1, component B
has three version identifiers: B1 , B2 and B3 . From this definition, component A in the previous exam-
Version-specific constraints often exist between various ple directly depends on B, C and D although it depends on
components in a system. For example, in Figure 1 com- functionalities provided by the component B through G. Sim-
ponent C has two versions and it depends on component ilarly, B and C directly depends on E.
E, which has 4 versions. Lets assume that component C’s We annotate each component in a CDG with a set of rela-
version C2 may only be compiled using E’s versions E3 and tions called direct dependencies, between each version of the
higher. This “constraint” is written in first order logic and component and versions of other components on which it
appears as (ver(C) = C2 ) → (ver(E) ≥ E3 ). Global con- directly depends. A direct dependency is a tuple t = (cv , d)
straints may be defined over entire configurations. For in- where cv is a version v of a component c. d is the depen-
stance, in the case study in Section 6, we require all compo- dency information used to build cv and is a set of versions
nents depending on a C++ compiler to use the same version of components on which c directly depends. When multiple
of C++ compiler in any single configuration. relation nodes lie on a path between a component and other
We now formally define the ACDM: components on which it directly depends, we take into ac-
count the semantics of the nodes by applying set operations
Definition 1. An ACDM is a pair (CDG, Ann), where recursively; Union for XOR nodes and Cartesian product for
CDG is a directed acyclic graph and Ann is a set of anno- AND nodes. For example, (A1 ,{B1 ,D1 }) is one of 15 direct
tations. dependencies for the component A in the previous example.
Definition 2. A CDG (Component Dependency Graph) As mentioned, we restrict the application context for com-
is a pair (V, E), where: (1) V = C ∪ R. C is a set of la- patibility testing to error-free build of components.1 In this
beled component nodes. Component node labels are mapped context, testing a direct dependency (cv , d) is to examine
1-1 to components required to test the SUT. R is a set of the build-compatibility of cv with component versions in
relation nodes whose labels come from the set {“∗” |“+”}. d. However, to build cv with d, in advance we need to
Relation nodes are interpreted as applying a logical func- build each component version in d as specified by one of
tion, AND or XOR, across their outgoing edges, and (2) E the direct dependencies for the component. This leads to
is a set of dependency edges, with each edge connecting two defining a configuration as a partially ordered set of direct
nodes. Valid edges are constrained such that no two compo- dependencies. In this paper, we totally order a configura-
nent nodes are connected by an edge: E = {(u, v)|u ∈ C, v ∈ tion in reverse topological order, starting from the bottom
R} ∪ {(u, v)|u ∈ R, v ∈ R} ∪ {(u, v)|u ∈ R, v ∈ C}. This node. For example, a configuration to test the direct depen-
enables the relationships between components defined solely dency (A1 ,{B1 ,D1 }) above is: ((G1 ,∅), (E3 ,{G1 }), (F2 ,{G1 }),
by relation nodes. (B1 ,{E3 }), (D1 ,{F2 }), (A1 ,{B1 ,D1 })). In the next section, we
Furthermore, valid CDGs obey the following properties: describe methods to generate configurations.
(i) There is a single distinguished component node with no
incoming edges called top. Typically top represents the SUT.
(ii) There is a single distinguished component node with no
4. PRODUCING CONFIGURATIONS
outgoing edges called bottom. This component is not depen- Although the configuration space for a software system
dent on any other component. (The bottom node may rep- can be tested exhaustively, that may be very expensive for a
resent an operating system, but that is not required.) (iii) large system. For practical considerations, developers must
All other component nodes, v ∈ {C/{top,bottom}}, have sample the space. In this section, we describe two methods
exactly one incoming edge and one outgoing edge. to generate configurations to either test the entire space or
to sample the space and cover all direct dependencies.
Definition 3. The annotation set, Ann used in this pa-
per contains two parts: (i) For each component c ∈ C, a set
that defines the range of elements (versions) over which c
4.1 Exhaustive-Cover Configurations
may be instantiated. (ii) A set of constraints between compo- The most straightforward way to build-test the range of
nents and over configurations. The constraints are specified configurations in which the SUT is build-compatible is to
using boolean operators (∨, ∧, →, ¬) and relational oper- build it on the exhaustive set of possible configurations. To
ators (≤, ≥, ==, <, >) between component versions; the compute an exhaustive configuration set, we start from the
version numbers are used to evaluate these expressions. bottom node of the CDG (one that does not depend on any
other components), and for each node type, do the following:
Except for the bottom node, all other components in a •Component node: compute new configuration set by ex-
CDG depend on functionalities provided by other compo- tending each configuration in the configuration set of its
nents on any path from the node encoding the component to child node (a relation node) with proper direct dependen-
the bottom node. However, many common build tools (e.g., cies of the component. That is, for each direct dependency
GNU Autoconf and Automake [15]) assume that a successful (cv , d) of the component, identify configurations from the
build of the component is influenced by other components configuration set of the child node, where each configura-
on which it directly depends. Hence, they check only for the tion contains all matched component versions specified in d.
functionalities provided by components on which the com-
ponent to build depends, by generating and testing a simple 1
In many Unix-based operating systems, building a compo-
program during the build process. Definition 4 defines a set nent commonly includes three steps – configuring, compiling
of components on which a component directly depends. and deploying the component.
65
G1 {}
how to build component versions in d, since the component
F1 {G1}
D1 D2
E1
D3 D1
E3
D2 D3 D1
E4
D2 D3 D3 D1
E2
D2 D3 D1
E1
D2 D3
F3
E3
D1 D2 D3
E4
D1 D2 D1 D3
E2
D2 D1 D3
E1
D2
F4
D1
E3
D3 D2 D1
E4
D3 D2 D2 D3
E2
D1 D2
F5
E1
D3 D1 D2
E3
D3 D1 D2
E4
D3 D1 D1 D3
E2
D2 D1
F2
E1
D3 D2 D1
E3
D3 D2 D1
E4
D3 D2
version cv must built on top of the component versions. This
is achieved by recursively selecting an appropriate direct de-
B2 B1 B3 C1 B2 B1 B3 C1 B2 B1 B3 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B1 B2 C1 B3 B1 B2 C1 B3 B1 B2 C1 B3 B2 B1 B3 C1 B2 B1 B3 C1 B2 B1 B3 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B1 B2 C1 B3 B1 B2 C1 B3 B1 B2 C1 B3 B2 B1 B3 C1 B2 B1 B3 C1 B2 B1 B3 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B1 B2 C1 B3 B1 B2 C1 B3 B1 B2 C1 B3 B2 B1 B3 C1 B2 B1 B3 C1 B2 B1 B3 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B1 B2 C1 B3 B1 B2 C1 B3 B1 B2 C1 B3 B2 B1 B3 C1 B2 B1 B3 C1 B2 B1 B3 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B1 B2 B3 C2 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1 B2 C2 B3 B1 C1
E2 {G1} A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1
66
we test the build compatibility of the direct dependency it assigned task (i.e. the one with the longest matching prefix),
encodes. In this section, we describe three test plan execu- so the VM for the cached task must be transferred across
tion strategies that attempt to maximize both parallelism the network from the client that produced the cached task,
and reuse of cached tasks. We also describe how to handle which can take a significant amount of time (a cached VM
component build failures during the plan execution. can be large, up to 1GB or more). The difference between
the assigned task and the cached task must then be built.
5.1 Plan Execution Strategies For the depth-first strategy, the decision to cache a task
that has just been executed is based on the number of chil-
Executing a test plan means that for each node we must
dren its node has in the plan. If the node has two or more
build cv on top of all the component versions contained in
children, the task may be reused to execute tasks for the
d. However, to test a node n, we need a machine on which
children, so the test server requests the client to cache the
all component versions encoded by n’s ancestor nodes have
task. However, if the node has only one child, the task for
been built, since the component versions contained in n.d
the child node is assigned to the same client by the first rule,
may also depend on other lower-level components. There-
so there is no reason to cache the task.
fore, the component versions in the sequence of direct de-
Since the depth-first strategy tries to utilize locally cached
pendencies encoded by the nodes in the path from the root
tasks, the number of locally reused tasks is maximized, min-
of the plan to the node n must be built before testing the
imizing the number of tasks that require task transfers be-
build compatibility of the direct dependency encoded by n.
tween clients. However, the cost to build the components
We call this sequence the task for the node n.
in a task will be high if the difference between an assigned
When we execute the task for n, all component versions
task and a locally cached task is large. In addition, when a
in the task must be built in a proper order, respecting all
large number of test clients are available and the test plan
dependencies. We use a virtual machine (VM) environment,
does not have many nodes near the root of the plan, many
called VMWare, for the builds so as not to contaminate the
clients could be idle during the early stage of plan execution,
persistent state of a physical test resource (machine). Then,
waiting for enough tasks to become available.
if the task execution is successful, which means that all com-
ponent versions were built without any error, the modified
Parallel Breadth-First Strategy: The parallel breadth-
machine state has the correct state for the task and the
first strategy focuses on increasing task parallelism. This
machine may be reused to execute tasks for n’s descendant
strategy tries to maximize the number of tasks being exe-
nodes. When we execute those tasks, we need only to build
cuted simultaneously, and secondarily tries to maximize the
additional components by reusing the VM state, which is en-
reuse of locally cached tasks. To assign tasks in breadth-first
capsulated in the file system of the physical machine hosting
order, the server maintains a priority queue of plan nodes
the VM. For all test plan execution algorithms we describe,
ordered according to their depth in the plan.
we assume that a single test server controls the plan execu-
At the initialization step, the algorithm initializes the pri-
tion and distributes tasks to multiple test clients. We also
ority queue by traversing the plan in breadth-first order,
assume that each client (a physical machine) has disk space
adding nodes to the queue until the number of nodes exceeds
available to store VMs (completed tasks) for reuse.
the number of test clients. When a leaf node in the plan is
traversed, it remains in the queue. On the other hand, when
Parallel Depth-First Strategy: The parallel depth-first
a non-leaf node is traversed, it is removed from the queue
strategy is designed to maximize the reuse of locally cached
and instead its child nodes are added to the queue. That
tasks at each client during the plan execution. When a client
is, we increase the number of tasks that can be executed in
completes executing a task for a node n and subsequently
parallel by assigning tasks for the child nodes.
requests a new task, the server assigns a task according to
When a task is requested by a client, the test server assigns
following rules, attempting to maximize cached task reuse.
the first unassigned task in the queue. Then, if a task is
First, if the node n is a non-leaf node in the plan, the task
executed by the client successfully, the algorithm locates the
for one of n’s unassigned child nodes is chosen as the next
node corresponding to the task in the queue, and appends
task for the client. The client will then reuse the VM state
the child nodes to the queue. To reduce the time to execute
from its previously executed task, so only have to build one
a task, the test server always finds the best cached task to
additional component (the one specified by the last direct
initialize the state of the VM to execute the task, although
dependency in the new task). This is typically the least
the cost to transfer the VM across the network may be high.
expensive way to execute a new task.
Unlike the depth-first strategy, for the breadth-first strat-
Second, if the node n is a leaf node, tasks already stored in
egy a completed task is cached if its corresponding node in
the cache space of the client are utilized to assign new task.
the plan is a non-leaf node. The rationale behind this choice
Starting from the node for the most recently cached task,
is that in many cases the tasks for the child nodes will not
the algorithm searches for an unassigned descendant node
be assigned to the same client. This strategy will keep all
in depth-first order. The nodes currently being executed by
clients busy as long as there are unassigned nodes in the
other clients, and their subtrees in the plan, are not visited
queue throughout the plan execution. Therefore, we expect
by the search. In this case, the test client must build the
high-level of parallelism. However, we also expect increased
difference between the assigned task and the reused task.
network cost compared to the depth-first strategy, because
Finally, if the algorithm cannot find an unassigned node
of transferring many cached tasks across the network.
using the first or second rule, the plan is searched in depth-
first order from the root node. As for the second rule, the
Hybrid Strategy: We have described costs and benefits
nodes currently being executed, and their subtrees, are not
of the depth-first and breadth-first strategy. Although the
visited. In this case, to reduce the time to execute the as-
depth-first strategy tries to maximize the locality of reused
signed task, the test server finds the best cached task for the
67
Local Coordinator Global Coordinator
ure are not confined to the descendant nodes in the subtree
Task Manager ACDM
ACDM of the failed node, but also include all descendant nodes in
Task Manager the subtrees of the nodes encoding the same direct depen-
Configuration Generator
VMwareTask Manager
dency. Therefore the contingency planning algorithm must
VMwareVM
Test Plan Builder generate new configurations to cover all the affected direct
VM
VMware
Component
Builder
dependencies.
Component LC handler ReqQ
VM Plan To reduce the number of newly generated configurations,
Builder
LC handler Execution we apply the algorithm described in Section 4.2 for the direct
VM
VM Cache Manager
Coordinator LC handler ResQ dependencies represented by the descendant nodes under the
VM Cache
subtrees in a post-order tree traversal. As configurations are
VM Cache Test DB produced for components in the CDG in topological order,
we expect that a direct dependency represented by a node
Figure 3: Rachet Software Architecture will be covered while generating configurations for the direct
dependencies represented by its descendant nodes.
tasks, during the early stage of plan execution it may not
maximize the parallelism that could be obtained by execut- 6. EMPIRICAL STUDIES
ing tasks on all available clients. On the other hand, the
breadth-first strategy may achieve high level of parallelism, We have developed an automated test infrastructure that
but may also increase the network cost to execute tasks. supports the Rachet process, and have performed empiri-
The hybrid strategy is designed to balance both the lo- cal studies with two scientific middleware libraries that are
cality of reused tasks and task parallelism throughout plan widely used in the high-performance computing community.
execution, by combining the features of both strategies. As In this section, we describe the overall architecture of the
in the breadth-first strategy, a priority queue of plan nodes Rachet infrastructure and present results from the studies.
is created by traversing the test plan, and is used to in-
crease task parallelism during the early execution stages of
6.1 System Architecture
the plan. That is, for the initial task requests from test The Rachet infrastructure is designed in a client/server
clients, tasks for nodes in the queue are assigned to all avail- architecture, as illustrated in Figure 3, utilizing platform
able clients immediately at the beginning of plan execution. virtualization technology.
To maximize locality for reused tasks, the first and second Global Coordinator (GC): The GC is the centralized test
rules for the depth-first strategy are subsequently applied to manager that directs overall test progress, interacting with
assign tasks to requesting clients. If both rules fail to find multiple clients. It first generates configurations that satisfy
an unassigned node, the test plan is traversed in breadth- the desired coverage criteria (e.g., direct dependencies) and
first order from the root node to find an unassigned node. also produces a test plan from the configurations, using the
This is based on the heuristic that a node closer to the root algorithms described in Section 4. Then, the GC dynami-
node will likely have a larger subtree beneath it than nodes cally controls test plan execution by dispatching tasks and
deeper in the tree, which will lead to more work being made the ancillary information necessary to execute the tasks to
available for a client reusing locally cached tasks. multiple clients, according to one of the test plan execution
strategies described in Section 5.
5.2 Dynamic Failure Handling The GC contains a testmanager thread and a set of lchan-
If building a direct dependency encoded by a node n fails, dler threads, one for each client machine. The testmanager is
n.cv could not be built correctly on top of the component responsible for creating configurations and a test plan. Dur-
versions contained in n.d. We say that n.cv is incompatible ing test execution, the testmanager satisfies requests from
with n.d, and use the failure information to guide further test clients and inserts test results into a database. When a client
plan execution. A build failure for a direct dependency for first requests a task, the GC creates an lchandler thread for
node n also prevents testing of all direct dependencies repre- the client and that lchandler is responsible for all communi-
sented by the nodes in n’s subtree. This is because we need cation with the client.
a VM on which all direct dependencies of the ancestor nodes Local Coordinator (LC): The LC controls task execution
have been built to test those direct dependencies. However, in a test client. One LC runs on each test machine and
the failure does not imply failure for all direct dependencies interacts with the GC to receive information on the tasks it
affected by the failure. In this situation, instead of regarding must execute and also to report execution results.
the direct dependencies as not able to be tested, we adjust As describe previously, task execution in the current Ra-
the test plan dynamically to test the direct dependencies in chet design and implementation means building the compo-
alternate ways, if possible, by producing additional config- nents in a task, taking into account the dependency infor-
urations to test them, and merging the new configurations mation needed to build each component. To do the builds,
into the test plan. This enables the test plan execution al- the LC employs hardware virtualization technology. The
gorithms to maximize direct dependency test coverage. components are built within a virtual machine (VM), which
Since one direct dependency can be used in multiple con- provides a virtualized hardware layer. This design is ad-
figurations, it can appear as multiple nodes in different bran- vantageous since the persistent state of the test machine is
ches of a test plan. If one of the (identical) test plan nodes never changed, so a large number of tasks can be executed
fails, we expect the others to also fail, since direct depen- on a limited number of physical test machines. The Rachet
dency testing assumes that the success of a component build implementation currently uses VMware Server as its virtual-
is mainly determined by the components on which it directly ization technology, since it handles virtual machines reliably
depends. Thus, the test plan nodes affected by a build fail- and also provides a set of well-defined APIs to control the
68
VM. A key feature of VMware Server is that the complete petsc ic
+ *
Virtual Machine Coordinator (VMC): The VMC is
mch lam blas
responsible for the actual component build process in a VM.
When a VM is started by the LC, the VMC is automatically * * * *
by the VMC and executed in the VM to build the required gxx mpfr
components.
*
69
Comp. Version Description System Type Cfgs Compcf gs Compplan
petsc 2.2.0 PETSc, the SUT InterComm Ex-Cover 3552 39840 9919
ic 1.5 InterComm, the SUT InterComm DD-Cover 158 1642 677
python 2.3.6, 2.5.1 Dynamic OOP language
PETSc Ex-Cover 1184 14336 3493
blas 1.0 Basic linear algebra subprograms
lapack 2.0, 3.1.1 A library for linear algebra operations
PETSc DD-Cover 90 913 309
ap 0.7.9 High-level array management library
pvm 3.2.6, 3.3.11, Parallel data communication Table 2: Test Plan Statistics
3.4.5 component
lam 6.5.9, 7.0.6, A library for MPI (Message Passing
7.1.3 Interface) standard
6.4 Cost-benefit Assessment
mch 1.2.7 A library for MPI As shown in Table 2, the EX-plans for both systems have
gf 4.0.3, 4.1.1 GNU Fortran 95 compiler a large number of configurations compared to the DD-plans.
gf77 3.3.6, 3.4.6 GNU Fortran 77 compiler
pf 6.2 PGI Fortran compiler
Since it takes up to 3 hours to build a configuration for ei-
gxx 3.3.6, 3.4.6, GNU C++ compiler ther InterComm or PETSc, it requires about 10,600 and
4.0.3, 4.1.1 470 CPU hours to execute the InterComm EX-plan or DD-
pxx 6.2 PGI C++ compiler plan, respectively, and 3,500 or 270 CPU hours for the corre-
mpfr 2.2.0 A C library for multiple-precision
floating-point number computations
sponding PETSc plans. With a naive plan execution strat-
gmp 4.2.1 A library for arbitrary precision egy where each configuration is always built from scratch,
arithmetic computation with 8 machines it would still take 1,325 or 438 hours, re-
pc 6.2 PGI C compiler spectively for the EX-plans with perfect speedup, and 59 or
gcr 3.3.6, 3.4.6, GNU C compiler
4.0.3, 4.1.1 34 hours for the DD-plans. However, since our plan exe-
fc 4.0 Fedora Core Linux operating system cution strategies reuse build effort across configurations, the
plan execution times for both plans are expected to be much
Table 1: Component Version Annotations for Inter-
smaller than times with the naive execution strategy. In our
Comm and PETSc
experiments, execution times were further shortened due to
many build failures.
For each subject system, we generated two test plans. Ta- The cost savings obtained by executing the DD-plans are
ble 2 summarizes the number of produced configurations, shown in Figure 5. With 8 machines, the InterComm EX-
and the number of components contained in those config- plan took about 29 hours with the parallel depth-first strat-
urations and in the test plan. The first test plan, called egy, during which 461 of the 9,919 component builds (the
EX-plan, was generated using the exhaustive coverage cri- number of nodes in the plan) were successful and 687 failed.
teria, and the other test plan, called DD-plan, only covers All other builds could not be tested due to observed fail-
all direct dependencies identified for the components in a ures. For the PETSc EX-plan, about 29 hours were needed,
model. For example, the PETSc EX-plan has 1,184 configu- during which we observed 724 build successes and 407 build
rations, containing 14,336 components to be built. However, failures for the 3,493 components in the plan, with the rest
the number of components in the final test plan is only 3,493, not able to be tested. Compared to the EX-plans, the Inter-
since configurations are merged to produce the test plan. Comm DD-plan took 12 hours with 275 successful compo-
We first conducted experiments to measure the costs and nent builds, and the PETSc DD-plan took 10 hours with 216
benefits of DD-cover compared to EX-cover, and also exam- successful builds. In our experiments, the execution times
ined the behavior of Rachet as the overall system scales. To for the EX-plans took only 2.5 – 3 times more than those
do that, we executed both the EX-plan and DD-plans with for the DD-plans, because many build failures occurred dur-
4, 8, 16 and 32 client machines, using the parallel depth-first ing plan execution, especially for the components close to
plan execution strategy. To compare the various test plan the bottom node in the CDGs. Note that the difference
execution strategies, we also executed the DD-plans for both in execution times between the EX-plans and DD-plans de-
subject systems using the parallel breadth-first and hybrid creases as more clients are used, since the Rachet system
strategies with the same range of client machines. always tries to best utilize the machines for plan execution
For all experiments, we ran the GC on a machine with and therefore a larger plan can benefit more when many
Pentium 4 2.4GHz CPU and 512MB memory, running Red clients are available.
Hat Linux 2.4.21-53.EL, all LCs run on Pentium 4 2.8GHz The results show that Rachet was able to achieve large
Dual-CPU machines with 1GB memory, all running Red Hat performance benefits by testing only configurations cover-
Enterprise Linux version 2.6.9-11. All machines were con- ing direct dependencies, and also was able to execute the
nected via Fast Ethernet. One LC runs on each machine, test plans efficiently using the depth-first execution strat-
and each LC runs one VM at a time to execute tasks. The egy. However, we also need to examine the potential loss
number of entries in the VM cache for each LC is set to 8, of test effectiveness from using the the DD-plan, that only
because in previous work [17] we observed little benefit from samples a subset of the configurations that are tested by the
more cache entries for the InterComm example, and also be- EX-plan. To do that, for all component build failures iden-
cause test plans for PETSc are smaller than test plans for In- tified by the EX-plan, we examined the component and its
terComm in this scenario. In addition to these experiments version that failed to build, and its dependency information.
on the real system, we ran simulations using our event-based We then checked whether building the same component ver-
simulator that mimics the behavior of the key Rachet com- sion failed in the same context in the DD-plan.
ponents, described in Section 6.1, to better understand the We found that each component build failure in the In-
performance characteristics of the Rachet on larger sets of terComm EX-plan exactly maps to a corresponding failure
resources than we were able to use for the real experiment in the DD-Plan. However, for the PETSc EX-plan, we ob-
(both because of limited resource availability and the time served 8 instances where a PETSc component build failure
required to perform experiments). or success depended on the exact compilers used to build
70
60 45
InterComm-EX Depth-first
InterComm-DD Breadth-first
PETSc-EX 40 Hybrid
50 35
30
40 25
20
30 15
10
20 5
0
4 8 16 32 64 128 256
10 -------------------- Actual times -------------------- --------- Simulated times ---------
Number of Client Machines (for InterComm)
45
0 Depth-first
Breadth-first
4 8 16 32 40 Hybrid
10
5
components on which the PETSc component depends (e.g.,
0
when Rachet tries to build a version of the PETSc compo- 4 8 16 32 64 128 256
-------------------- Actual times -------------------- --------- Simulated times ---------
nent with the GNU compilers on a VM, the MPICH com- Number of Client Machines (for PETSc)
ponent might have been previously built on the machine
with the GNU compilers or with the PGI compilers.) Un- Figure 6: InterComm and PETSc DD-plan execu-
fortunately, all those instances were reported as successful tion times, with different plan execution strategies.
builds during the DD-plan execution. We observed that Breadth-first shows poor performance on few ma-
this happened because there were missing constraints in the chines. Depth-first and hybrid show similar perfor-
model. For these instances, the missing constraint was that mance due to many build failures.
compilers from the same vendor must be used to build the
components on which the PETSc component directly de-
pends. PETSc developers might have simply assumed this 6.5 Comparing Plan Execution Strategies
constraint. However, users do not always have complete in- As seen in Figure 5, Rachet scales very well as the number
formation on the compilers used to build those components of machines used to run Rachet clients increases from 4 to
on their system, especially if the system is managed by a 32. When we double the number of available machines, the
separate system administrator. Another observation for the execution time decreases by almost half, up to 16 machines.
PETSc component is that it was never able to be built suc- This means that Rachet can fully utilize additional resources
cessfully using the LAM MPI component. It seems that to maximize the number of tasks executed in parallel. How-
some undocumented method is required to build PETSc us- ever, Figure 5 shows results obtained by executing the DD
ing that MPI implementation. and EX plans for the subject systems using only the parallel
For InterComm, due to many build failures of the com- depth-first strategy. To analyze the performance behavior
ponents in the model, we were only able to test build com- of the different plan execution strategies, we also executed
patibility for 7 direct dependencies out of its 156 direct de- the DD-plans for InterComm and PETSc using the other
pendencies for the InterComm component. However, they strategies.
were not the ones on which InterComm has been tested be- Figure 6 shows the combined results from both actual
fore. The results show that InterComm can be successfully and simulated plan executions with different strategies. For
built with the combinations of PGI C/C++ compiler version both systems, we ran actual experiments with 4, 8, 16 and
6.2, all versions of the GNU Fortran77 or GNU Fortran90 32 clients. For larger numbers of clients, we ran simula-
compilers, and MPICH version 1.2.7. This is a larger set of tions to compute expected plan execution times. The data
components than what the InterComm developers had pre- used for the simulations, including the component build suc-
viously tested, as documented on the InterComm distribu- cesses/failures, the average times needed to manage VMs
tion web page. The direct dependency with GNU C/C++ and to build components, were all obtained from real exper-
compiler version 3.3.6 and with the PGI Fortran compiler iments. The simulated times were, on average, about 18%
version 6.2 failed to build. The failure occurred because the less than the actual times for up to 32 clients.
InterComm configure process reported a problem in linking We found that the breadth-first strategy performed worst
to Fortran libraries from C code. This result is interest- for most runs. As described before, with the breadth-first
ing since the InterComm web page claims that InterComm strategy, Rachet tries to utilize as many machines as pos-
was successfully built with GNU C/C++ version 3.2.3 and sible throughout the plan execution, and always reuses the
PGI Fortran version 6.0. We reported all the results to the best cached virtual machine to execute each task. How-
InterComm developers and a bug fix for the failed direct ever, the time to transfer the VMs across the network was
dependency is being investigated. a performance bottleneck, even though the clients were con-
71
1024
EX-plan with hybrid allel task execution. In spite of these overheads, we expect
DD-plan with depth-first
512 DD-plan with breadth-first that the hybrid strategy will achieve the best performance as
Simulated Plan Execution Time (in hours)
DD-plan with hybrid
72
Our work is broadly related to component installation portable linear algebra library for high-performance
managers that deal with dependencies between components. computers. In Proc. of Supercomputing ’90, pages
Opium [14] and EDOS [8] are two example projects. Opium 2–11, Nov. 1990.
makes sure a component can be installed on a client machine, [2] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp,
while EDOS checks for conflicting component requirements D. Kaushik, M. G. Knepley, L. C. McInnes, B. F.
at the distribution server. Both projects assume that com- Smith, and H. Zhang. PETSc users manual. Technical
ponent dependencies are correctly specified by the compo- Report ANL-95/11 - Revision 2.3.2, Argonne National
nent distributors. Rachet differs in that we test component Laboratory, Sep. 2006.
compatibility over a large range of configurations in which [3] M. B. Cohen, M. B. Dwyer, and J. Shi. Coverage and
the components may be installed. adequacy in software product line testing. In In Proc.
of ROSATEA ’06, pages 53–63, 2006.
8. CONCLUSIONS AND FUTURE WORK [4] K. Czarnecki, S. Helsen, and U. Eisenecker.
We have presented Rachet, a process, algorithms and in- Formalizing cardinality-based feature models and their
frastructure to perform compatibility testing. Our work specialization. Software Process: Improvement and
makes several novel contributions: a formal model for de- Practice, 10(1):7–29, Jan./Mar. 2005.
scribing the configuration space of software systems; algo- [5] J. Dongarra, J. D. Croz, S. Hammarling, and I. S.
rithms for computing a set of configurations and a test plan Duff. A set of level 3 basic linear algebra subprograms.
that test all direct dependencies for components in the model; ACM Transactions on Mathematical Software,
test plan execution strategies focused on minimizing plan ex- 16(1):1–17, Mar. 1990.
ecution time, while allowing contingency management that [6] A. Duarte, G. Wagner, F. Brasileiro, and W. Cirne.
improves test coverage over static approaches if and when Multi-environment software testing on the Grid. In
an attempt to build a component fails; and automated in- Proc. of the 2006 Workshop on Parallel and
frastructure that executes a test plan in parallel on a set of Distributed Systems: Testing and Debugging, Jul.
machines, distributing tasks to best utilize resources. 2006.
The results from our empirical studies on two large soft- [7] W. E. Lewis. Software Testing and Continuous
ware systems demonstrate that Rachet can detect incompat- Quality Improvement. CRC Press LLC, Apr. 2000.
ibilities between components rapidly and effectively without [8] F. Mancinelli, J. Boender, R. di Cosmo, J. Vouillon,
compromising the test quality, compared to the exhaustive B. Durak, X. Leroy, and R. Treinen. Managing the
approach. We also examined the tradeoffs between plan exe- complexity of large free and open source
cution strategies on different system scales, by running both package–based software distributions. In Proc. of the
actual experiments and simulations. The results suggest 21st Int’l Conf. on Automated Software Engineering,
that the hybrid strategy can achieve the best performance pages 199–208, Sep. 2006.
by attaining high locality to optimize task reuse and high [9] A. Memon, A. Porter, C. Yilmaz, A. Nagarajan, D. C.
task parallelism, for both small and large system scales. Schmidt, and B. Natarajan. Skoll: Distributed
Based on these results, we plan to work on several issues. continuous quality assurance. In Proc. of the 26th Int’l
First, we will further optimize the plan execution strate- Conf. on Software Engineering, pages 459–468, May
gies. Information on the expected cost to execute tasks 2004.
and benefits of reusing cached tasks might help Rachet to [10] PostgreSQL BuildFarm. http://www.pgbuildfarm.org.
make better decision for task assignment and reuse. Sec-
[11] R. Sabharwal. Grid infrastructure deployment using
ond, we will explore new types of coverage criteria that pro-
SmartFrog technology. In Proc. of the 2006 Int’l Conf.
duce fewer configurations, but still provide effective cover-
on Networking and Services, Jul. 2006.
age. Third, we will investigate adding cost models into plan
execution strategies to prioritize the order in which tasks [12] A. Sussman. Building complex coupled physical
are distributed so as to maximize the number of direct de- simulations on the Grid with InterComm. Engineering
with Computers, 22(3–4):311–323, Dec. 2006.
pendencies for the SUT tested within a fixed time period.
Fourth, for components distributed using popular packag- [13] T. Syrjänen. A rule-based formal model for software
ing methods (e.g., Autoconf/Automake), we plan to explore configuration. Technical Report A55, Helsinki
ways to extract dependencies automatically. Finally, we will University of Technology, Dec. 1999.
extend our work to include functional and performance test- [14] C. Tucker, D. Shuffelton, R. Jhala, and S. Lerner.
ing for a software system. OPIUM: Optimal package install/uninstall manager.
In Proc. of the 29th Int’l Conf. on Software
Engineering, pages 178–188, May 2007.
Acknowledgments
[15] G. V. Vaughan, B. Elliston, T. Tromey, and I. L.
This research was supported by NASA under Grant Taylor. GNU Autoconf, Automake, and Libtool. Sams,
#NNG06GE75G and NSF under Grants #CNS-0615072, 1st edition, 2000.
#ATM-0120950, #CCF-0205265 and #CCF-0447864. This [16] VMware Inc. Streamlining software testing with
work was also partially supported by the the Office of Naval IBM Rational and VMwareTM : Test lab
Research under Grant #N00014-05-1-0421. automation solution - whitepaper, 2003.
[17] I.-C. Yoon, A. Sussman, A. Memon, and A. Porter.
9. REFERENCES Direct-dependency-based software compatibility
[1] E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, testing. In Proc. of the 22st Int’l Conf. on Automated
A. McKenney, J. D. Croz, S. Hammarling, J. Demmel, Software Engineering, Nov. 2007.
C. H. Bischof, and D. C. Sorensen. LAPACK: A
73