You are on page 1of 12

Performance Assertion Checking

Sharon E. Perl” William E. Weihl


DEC Systems Research Center MIT Laboratory for Computer Science
Palo Alto, CA Cambridge, MA
perl@src.dec.com weihlQlcs. mit .edu

Abstract 1 Introduction

Testing and debugging performance are difficult for


Performance assertion checking is an approach to
several reasons: expectations about performance
automating the testing of performance properties
of complex systems. System designers write asser- are oft en vague, the volume of monit oring data t hat
must be examined is large, and performance may
tions that capture expectations for performance;
change in unanticipated ways as a system evolves.
these assertions are checked automatically against
As a result, most complex systems have perfor-
monitoring data to detect potential performance
mance bugs; they fail to meet their designers’ or
bugs. Automatically checking expectations allows
users’ expectations about elapsed time, through-
a designer to test a wide range of performance prop-
put, utilization, and other performance properties.
erties as a system evolves: data that meets expec-
tations can be discarded automatically, focusing at- In this paper we describe a systematic approach to

tention on data indicating potential problems. performance testing and debugging that automates
part of the process,
~S~ec is a language for writing performance as-
In our approach, system designers write asser-
sertions together with tools for testing assertions
tions to capture their expectations about perfor-
and estimating values for constants in assertions.
mance. These assertions are then checked automat-
The language is small and efficiently checkable, yet
ically, focusing the designers’ attention on moni-
capable of expressing a wide variety of performance
toring data that indicates potential performance
properties. Initial experience indicates that PSpec
bugs, In addition, assertions can be checked pe-
is a useful tool for performance testing and debug-
riodically, so that performance bugs are detected
ging; it helped uncover several performance bugs
soon after they appear, This approach is embod-
in the runtime system of a parallel programming
ied in PSpec, a language and associated tools for
language.
writing and checking assertions.

PSpec is useful for:


*This research was supported by the NSF under grants
CCR-8716884 and CCR-8822158, by DARPA under con-
tracts NOO014-89-J-1988 and NOO014-91-J-1698, by Digital b Performance regression testing: when a sys-
Equipment Corporation, and by grants from AT&T and tem is changed, performance assertions can be
IBM. The views and conclusions contained in this document rechecked to ensure that the system still meets
are those of the authors and should not be interpreted as
expect ations.
representing the official policies, either expressed or implied,
of the U.S. government,
● Continuous system monitoring: performance
Permission to copy without fee all or part of this material IS
granted provided that the copies are not made or distributed for
assertions can be checked periodically during
direct commercial advantage, the ACM copyright notice and the normal use to detect performance problems
title of the publication and its date appear, and notice is given
and violations of assumptions about work-
that copying is by permission of the Assoclatlon for Computing
Machinery. To copy otherwise, or to republish, requires a fee loads.
and/or speclflc permission.
SIGOPS ‘931121931 N. C., USA 4 Performance debugging: successively more de-
@ 1993 ACM 0.8g791-632-8/93 /0012 . ..$505O

134
——— ——
tailed performance assertions may be helpful I I
Program + I
for pinpointing the location of performance I Monitoring
problems in the system. I__T_d
I
● Clarifying expectations: writing precise per-
formance assertions helps system designers un-
derstand what they can and cannot guarantee
about their systems.

PSpec is not intended for proving that systems


have particular performance properties.
Solver Evaluator Checker

b-
Automatic testing of performance properties is
not a new idea. Compiler builders, for example, I [
often have regression test suites for testing perfor- t i.
log assertion
mance as well as functionahty. PSpec generalizes values failures
such domain-specific testing methods, providing a
simple, general set of tools together with a lan-
Figure 1: The PSpec approach
guage that makes it easy to express expectations
about performance.
PSpec language. The language is a notation for ex-
PSpec is intended to be useful in concurrent sys-
pressing predicates about monitoring logs, Many
tems, ranging from multitasking uniprocessor sys-
common kinds of performance metrics, such as
tems to small-to-medium scale multiprocessors to
elapsed time, throughput, utilization, and work-
distributed systems. It maybe applicable to highly
load characteristics, can be expressed.
parallel systems as well, but we did not consider the
requirements of such systems during the design. A monitoring log is an abstraction of a program’s
execution that contains everything relevant to ex-
The initial versions of the PSpec tools were ap-
pressing performance assertions. The user supplies
plied to the runtime system of a parallel program-
an augmented version of the program that gener-
ming language running on a multiprocessor sim-
ates a monitoring log for each run. The logging
ulator. We are currently using the tools to do
facility is not part of PSpec. Instead, the PSpec
performance assertion checking for an experimental
tools use a log interface that can be implemented
Modula-3 [14] RPC system.
on top of available logging facilities. PSpec will
In the next section, we give an overview of
be most useful with logging facilities that permit
PSpec. Section 3 discusses related work. Sections 4
user-defined event types, but will also work if only
and 5 present the PSpec language along with exam-
a fixed set of event types is available.1
ples of performance properties that it can express.
The checker, evaluator, and solver take as input
Section 6 discusses language design choices. Sec-
performance specifications and monitoring logs.
tion 7 describes the tools. Section 8 describes our
The checker is useful for testing, including regres-
experience using PSpec. Finally, Section 9 summa-
sion testing and periodic monitoring; it reports
rizes our approach and results.
which assertions fail to hold for the run represented
by a log. The evaluator and solver are useful for
2 PSpec Overview writing performance specifications and for perfor-
mance debugging. The evaluator provides a read-
The PSpec system, illustrated in Figure 1, has sev- eval-print loop for evaluating expressions involving
eral components: performance specifications, mon- data in a log. The solver uses logged data to help
itoring logs, and the checker, evaluator, and solver a specification writer determine values for numeric
tools . constants in assertions.
Performance specifications contain assertions 1The kinds of performance assertions that can be written
about performance written by the user in the may be limited in this case.

135
A programmer who wants to use the PSpec tools that would provide much of the power and disci-
to check assertions for a program might do the fol- pline of the abstractions in PSpec; we have not
lowing: explored this avenue in detail. However, it seems
doubtful that this approach would achieve the same
1. Decide what to assert about the program’s level of abstraction and readability as PSpec.
performance. Express the assertions in the
PSpec language. There are a couple of ways we can imagine using
a database query language for writing performance
2. Instrument the program to record the data assertions. One is to store monitoring logs in a
needed to check the assertions. [How this step database and run queries to test performance prop-
is accomplished depends upon the particular erties. Another is to use a query language as the
logging facility being used,) vehicle for writing assertions without actually stor-
ing logs in a general-purpose database; this is the
3. Run the program to obtain a monitoring log.
approach taken by Snodgrass and by Liao and Co-
4. Use the PSpec tools to process the log and hen. A database query language has abstractions
specification, that are similar in some ways to those in PSpec;
for example, the ability to write “aggregate expres-
5. Repeat the preceding steps, possibly also mod- sions” in some form. However, database query lan-
ifying the program, until the specification ex- guages are significantly more general than PSpec.
presses the desired performance properties and We believe that the additional generality makes the
the program satisfies them. entire monitoring and testing system more complex
(a belief that is bolstered by Snodgrass’s reported
6. Periodically gather monitoring data from the
experience). PSpec restricts the assertions that can
program and use the checker to make sure the
be written, resulting in a simpler and more efficient
program still performs as expected.
system. Since we have yet to see evidence that the
additional generality is needed, we think we have
3 Related Work made the right tradeoff.

Checking performance properties automatically is Liao and Cohen’s relational model includes prim-

not a new idea. We have heard a number of anec- itive relations corresponding to events in AP5 pro-

dotes about people writing log-processing programs grams [5], They take advantage of this to instru-

(e.g., in awk [1], C [10], or perl [17]) to check ment AP5 programs automatically and to reduce

whether desired properties hold, In addition, Snod- the number of events logged (by analyzing queries

grass [16] and Liao and Cohen [12] have developed to determine which events are really needed). Re-

systems based on relational query languages for ducing the work required of the programmer to

writing and checking performance assertions. We writ e and check assertions is important, and auto-

believe that PSpec is more effective for this pur- matic instrumentation is clearly one important as-

pose than either ad hoc approaches based on writ- pect of this. Initially, PSpec did not have any sup-

ing programs in some language like C or approaches port for automatic instrumentation, but we have
based on database query Iangauges. recently added features for automatically instru-
menting events corresponding to the start and end
Compared to writing programs to process logs,
of procedure bodies (see Section 5.6),
the abstractions provided by and the discipline
imposed by PSpec make it significantly easier to Our work emphasizes the use of precise expres-
achieve the same effect, and assertions written in sions of performance expectations as the basis for
PSpec are significantly more readable. In addi- automatic performance testing. The P Spec solver
tion, by restricting the power of PSpec relative to a and evaluator can also help in the process of de-
general programming language, we expect to learn veloping expectations and debugging a system to
from experience what is really needed for writing the point where it meets expectations initially, but
performance assertions. It might be possible to de- other kinds of tools will also be helpful. In this
sign a library package for C or some other language sense, our work is complementary to work on pro-

136
An interval consists of all events in a log between
StartRead(tid = 102, ts = 1)
a designated start event and an end event. Inter-
InterruptsO#(pid = 1, tid = 105, ts = 2)
vals have named types and named metrics (similar
InterruptsOn(pid = 1, ts = 3)
to event attributes). Values of metrics are based
InterruptsOff(pid = 2, tid = 120, ts = ~)
on the events in an interval. For example, we could
CacheHit(tid = 102)
define an InterruptsDisabled interval type for inter-
StartRead(tid = 10~, ts = 5)
vals that start at InterruptsOfl events and continue
EndRead(tid = 102, ts = 6)
through corresponding InterruptsOn events, with a
Interrupts On(pid = 2, ts = 7)
met ric called time whose value for each interval is
EndRead(tid = 10~, ts = 8)
the difference of the timestamps of its start and
StartRead(tid = 10J1 ts = 9)
end events.
CacheHit(tid = 104)
EndRead(tid =lOd, ts=lO) Intervals are the primary abstraction used in
writing PSpec performance assertions. Typically,
a specification writer determines what to assert
Figure 2: Sample monitoring log. (Time increases
about a program’s performance, defines interval
down the page.)
types that capture the necessary metrics, and then
writes predicates that apply over a set of intervals.

filing tools (e.g., [8, 6, 2]) and on visualization (e.g.,


[3, 7,9,11, 13]). Profiling tools are useful for tuning
5 The PSpec Language
when the user has few prior expectations about per-
formance. Visualization tools, which provide static
The PSpec language provides constructs for declar-
or dynamic pictorial displays of performance data,
ing event types, declaring interval types, and ex-
can highlight unexpected effects that may be hid-
pressing assertions. Below we give a flavor of the
den in a mass of numbers. Both profiling and visu-
language through a series of examples. At the same
alization tools are designed for a person to examine
time we show how to express assertions about a va-
the data collected, so they are not generally useful
riet y of performance properties, including elapsed
for large-scale, ongoing testing.
time, throughput, and workload properties. A de-
script ion oft he full language, along with additional
4 Concepts examples of performance properties, can be found
in the first author’s dissertation [15].
We begin our presentation of the PSpec approach
We begin by showing how to declare simple in-
by describing the concepts underlying the PSpec
terval types and write simple assertions. The ex-
language. The language is a precise notation for
amples include assertions about elapsed times and
expressing performance assertions in terms of the
workload properties. Then we show how to extend
data captured in a monitoring log. A log is a se-
interval type declarations, assertions, and metric
quence of typed events. Figure 2 shows a repre-
definitions in turn, with examples of assertions
sentation of a sample log, which contains events
about throughput and workload. The following
of several types: StartRead, EndRead, Interrupt-
section discusses the language design choices and
sO#, InterruptsOn, and CacheHit. These events
tradeoffs.
have named, numeric attributes such as tid (thread
All of the examples below are set in the context
identifier) and ts (timest amp).
of Read operations on a file system.
An event cent ains information recorded at a sin-
gle point in a program’s execution. For example, a
StartRead(tid, ts) event indicates that a file system 5.1 Declarations and Assertions
Read operation started in the thread with id tid at
time ts. Similarly, an Interruption (pid, ts) event The simplest way to identify an interval type is
records that interrupts were enabled on processor to give the types of its start and end events, and
pid at time ts. the simplest kind of metric definition is one that

137
uses only the values of the start and end event at- a statistical assertion about a workload property.
tributes. For example, suppose we (as specification Again, we can define an interval type and write
writers) want to define a Read interval type with a the assertion using an aggregate expression:
time metric. We start by declaring the event types:
interval ReadReq =
timed event StartReado; EndReado . s: StartRead, e: StartRead
met~ics
StartRead and EndRead are declared to be timed time = ts(e) – ts(s)
event types wit h no explicit att ribut es. (Timed end ReadReq;
events have an implicit timest amp. ) assert

Now, we can declare the interval type: {mean r : ReadReq : r.time} k 0.1 sec .

interval Read = The interval defined above spans the time from one

s: StartRead, e: EndRead Read request to the next. The assertion contains

metrics an aggregate expression that uses the built-in mean

time = ts(e) – ts(s) aggregate operator to compute the average time

end Read . between read requests.

It is worth noting that the expression that ap-


A Read interval starts with an event of type Start- pears after the second colon in an aggregate expres-
Read and ends with the next event of type EndRead sion may itself contain aggregates, within limits. In
after the start event. s and e are names for the particular, inner aggregate expressions may not re-
start and end events for an interval of the type. fer to the dummy variables of outer ones. This lim-
Each Read interval has a time metric whose value itation applies to aggregate expressions that iterate
is the difference of its start and end timestamps. (ts over events or intervals in a log, called log-aggregate
is a built-in function that returns the timestamp of expressions. z
a timed event. )

One simple kind of assertion is a predicate that 5.2 More on Identifying Intervals
applies to all events or intervals in a log. Suppose
we would like to say that “the elapsed time for any To obtain more control over how events are
Read operation is at most ten milliseconds.” We mat ched up to form intervals, predicates may be
can write: added to start and end events. For example, if mul-
tiple threads are generating Read events, we need
asse~t {& r : Read : r.time < 10 ms} . to match up StartRead and EndRead events for the
same thread to form Read intervals.
This assertion can be read as: “for all intervals r of
To match up events for the same thread, first we
type Read, the value of r’s time metric is at most
add a tid (thread identifier) attribute to our events
ten milliseconds.”
(along with a size attribute for later use):
The assertion above contains an example of an
aggregate expression. Aggregate expressions pro- timed event StartRead(tid, size);
vide a way to generate a sequence of values and EndRead (tid).
combine them with an operator. In the example
above we generate a sequence of intervals by iter- Then we alter the interval type declaration:
ating over alI intervals of a specified type, Read,
binding each in turn to the dummy variable r. The interval Read =
operator & (logical “and”) is used to combine the s: StartRead,

sequence of booleans produced by the expression e: EndRead where e.tid = s.tid


r, time < 10 ms evaluated for each value of r. metrics

Suppose we would like to write an assertion 2Aggregate expressions can also iterate over mappings
about the average rate of Read requests. This is (described below).

138
time = ts(e) – ts(s), 5.4 More on Defining Metrics
size = s. size
end Read . Thus far, we have seen how to compute metric val-
ues using the start and end events of an interval, To
enable metrics to be computed using all the events
The where-clause following the end event type con-
or subint ervals wit hin an interval, we allow aggre-
tains a predicate that restricts the end event for a
gate expressions to appear in metric definitions.
Read interval to have the same thread identifier
as the start event. In general, a where-clause at- Suppose that we would like to write an assertion
tached to an end event may be any boolean-valued about the effectiveness of the file system cache for
expression (that does not cent ain aggregates). The Read operations. We add a hit metric for Read in-
expression may refer to the start and end events for tervals that is true whenever a Read hits entirely in
the interval. A where-clause may also be attached the cache (detectable by the presence of a CacheHit
to a start event, in which case it can refer to the event in a Read interval). Then we compute the
start event but not the end event. cache hit rate as the percentage of Read intervals
whose hit metric is true,
One use for this Read interval type is to write an
assertion about Read throughput. The following The CacheHit event type is declared as follows:
assertion says that the throughput is at least one
event C’acheHit(tid) .
million bytes per second:

The definition of the hit metric to be added to the


def total = {last e : EndRead : ts(e))
Read interval declaration from Section 5.2 is:3
- {jirst e : StartRead : ts(e)j;
assert {+ r : Read : r, size) / (total/l see) hit = {count c : CacheHit
> 1.0e6 . where c.tid = s.tid} # O .

(For convenience we define an identifier total and Implicitly, when an aggregate expression appears
bind it to the computed time from the start of the in an assertion or other “top-level” expression, it
first Read to the end of the last one. Dividing total ranges over all events or intervals in the log. When
by 1 second converts it from timestamp units to an aggregate expression appears in a metric defini-
seconds. ) tion, however, it is defined to range over just the
events or subintervals that occur inside the metric’s
interval. Thus for each Read interval, the value of
5.3 More on Expressing Assertions its hit metric will be based just on the CacheHit
events between its StartRead and EndRead events.
We have seen how aggregate expressions allow us
Now we can use the hit metric in an assertion
to compute functions over all events or intervals of
about the hit-rate:
a specified type in a log. Sometimes, however, we
would like to compute over a subset of events or
assevt {count r : Read where r.hit)
intervals. We do this by using a where-clause to
/ {count r : Read} >0.75.
restrict the sequence of values used in an aggregate
expression. For example: Note that the effectiveness of the cache is a work-
load property, since it depends upon the pattern of
assert {& r : Read where r.size < 4096 requests presented by file system clients. Unlike
: r.time < 10 ms) . the Read request rate, which is also a workload
property, the effect of the cache on file system per-
The where-clause rest ricts the sequence of Read in- formance is not easy to express purely in interface-
tervals to those whose size is at most 4096 bytes. In Ievel terms (Read requests). However, we are free
general, a where-clause in an aggregate expression
3This example uses the special aggregate operator count.
may refer to the dummy variable (r in this exam- The expression is a convenient shorthand for {+ c : CacheHit
ple) and may itself contain aggregate expressions. . ...” 1}.

139
to introduce the lower-level notion of “hit rate” to
perfspec FSRead
write the desired assertion.
timed event StartRead(tid, size);
The cache hit rate example uses aggregate ex-
EndRead (tid);
pressions over events within an interval. It is also
event CacheHit(tid);
possible to write aggregate expressions over sub-
interval Read =
intervals within an interval. An interval i is a
s: StartRead,
subinterval of j if z starts after j starts and ends
e: EndRead where e.tid = s.tid
before j ends. One use of sub-intervals is to count
metrics
or time subcomputations of a computation.
time = ts(e) – ts(s),
Additional flexibility in defining metrics comes size = s.size,
from mappings, which are sparse arrays indexed hit = {count c : CacheHit
by numbers. The number of different named met- where c.tid = s.tidj # O
rics for an interval is statically determined when a end Read;
specification is written. A mapping can represent a assert {& r : Read : r.time < 10 ins);
dynamic number of metrics. For example, we could {mean r : Read : r.time}’< 5 ms;
define a mapping consisting of per-processor or per- {& r : Read where r.si.ze < J096
thread metrics, indexed by processor id or thread : r.time < 10 ins};
id. The PSpec language provides operations for {count r : Read where r.hitj
creating single-element mappings, combining map- / {count r : Read} >0.75
pings, and choosing elements from them. end FSRead

5.5 Specifications
Figure 3: An example specification,

A performance specification is comprised of a set


of performance assertions with their accompany- of an existing interval type with additional metrics.
ing declarations, It provides a name that can be The subtype has the same start and end event spec-
used to qualify event and interval names, prevent- ification, and all the metrics of the supertype. This
ing name clashes among multiple specifications. A is particularly useful for interval types introduced
performance specification using some of the exam- by proc declarations; the specification writer can
ples in this section is shown in Figure 3, define subtypes of the proc interval type with the
desired additional met rics.
5.6 Other Features A third feature is the ability to import one spec-
ification into another. Importing a specification S
A few additional language features are worth men-
into another specification T allows the event and
tioning. The proc declaration provides a short-
interval types declared in S to be used in T,
hand for declaring two event types and an inter-
val type associated with a procedure in a moni-
tored program. The event types correspond to calls 6 Discussion
and returns of the procedure and the interval type
corresponds to invocations of the procedure. We Our design philosophy has been to keep the PSpec
added this language feature after observing that it language small, including only features that are
is quite common for events and intervals to corre- general and for which we saw a clear need in the
spond to procedures. Proc declarations are partic- examples we studied. We wanted the language to
ularly helpful with monitoring tools that monitor be capable of expressing a wide variety of perfor-
procedure invocations, making it easy to automat- mance properties. Also, we wanted specifications
ically enable monitoring for just those procedures to be both readable, so that they can serve as doc-
relevant to a specification. umentation of performance expectations, and effi-

Another feature is interval subtyping. It is possi- ciently checkable.

ble to declare a new interval type that is a subtype As usual, there are tensions among these design

140
goals. For example, efficiency must be balanced not apparent from the examples, arises from the
against expressive power; the relational calculus restriction that an interval’s metrics are computed
used by Liao and Cohen [12] and by Snodgrass using only the events within the interval. For ex-
[16] is very expressive, but expensive to evaluate. ample, we might want to write an assertion about
Also, expressive power can conflict with readability. the elapsed time for each Read operation that is
A general-purpose programming language provides not concurrent with any Write operation. How-
unlimited log-processing power, but the asserted ever, we cannot tell whether a Read is concurrent
performance properties may not be apparent from wit h Writes by examining only the events within
reading the log processing program. the Read interval because the Read may be fully
cent ained within a Write interval, In general, one
Our general design goals give rise to a number
could imagine that the metrics for an interval could
of detailed design choices, of which we briefly dis-
be computed using the entire log. In the examples
cuss three. The first concerns the identification of
we have studied, using only the events from the
intervals. The PSpec language identifies intervals
part of the log before the interval would suffice.
by naming the event types of the start and end
More research is needed to understand the scope
events. Where-clauses provide additional control
and severity of this limitation. In spite of this con-
over whether an event of the correct type is actu-
straint, the language has proved general and flexi-
ally the start or end event for an interval. Other
ble enough to express a variety of common perfor-
possibilities for identifying intervals include identi-
mance properties.
fying start and end events by disjunctions of event
types, identifying fixed-time intervals, identifying
start and end events by their attribute values, and 7 The PSpec Tools
identifying intervals by patterns of events. Experi-
ence suggests that disjunctions of events and fixed- We now turn to the tools that process performance
time intervals might be useful. We have not seen specifications and monitoring logs. We outline the
compelling evidence for the others. functionality of the checker, evaluator, and solver,
and discuss some aspects of their implementation.
The second design choice concerns aggregate ex-
pressions. The PSpec language is assignment-free.
This choice is motivated by readability considera- 7.1 The Checker
tions and manifested primarily in the use of aggre-
The checker determines whether a given perfor-
gate expressions to perform computations over se-
mance specification is true of a given monitoring
quences of events and intervals and over mappings.
log. The checker is flexible about how events in logs
An alternative would be to introduce state vari-
are matched with event types in specifications; it
ables that are updated as each event is read from
can be customized to whatever monitoring system
a log. However, specifications written in that style
is being used.
tend to be harder to read. Aggregate expressions
The current checker works off-line. The system
are concise, readable, and adequately flexible.
being monitored is run for as long as desired to
The third design choice concerns metric defini-
gather a log, which can then be used to check the
tions, which are restricted so that intervals can be
specification at any time. This design works well
computed using a reasonable amount of space in a
for the setting in which the checker has been used
single pass over a log file. There are two restric-
thus far (described in Section 8). One could imag-
tions: aggregate expressions in metric definitions
ine checking specifications on-line, as data is gen-
may not contain nested aggregates, and may not
erated from a running system; we intend to explore
refer to the end event for an interval. In practice
such designs in the future.
these restrictions have not limited expressive power
for the examples we have studied,
7.2 The Evaluator
The examples in the previous section illustrate
some of the kinds of performance properties that The evaluator takes a performance specification
can be expressed in the language. One limitation, and a monitoring log, and provides an interactive

141
read-eval-print loop for computing values of PSpec efficient, alt bough there are cases where a clever
expressions using the log. The expressions can use evaluation strategy could use fewer passes through
event and interval types declared in the specifica- the log. We assume that the log is too large to
tion. The evaluator provides a structured way of store in memory, so each pass reads the log from
viewing the contents of a log; it is useful for ex- disk, processing one event at a time.
ploring performance and developing expect ations, The time used by the algorithm that evaluates a
and also for tracking down the causes of assertion specification for a monitoring log is linear in the
failures reported by the checker. log length, provided that the maximum number
of overlapping intervals at any point in the log is
not proportional to the log length. The space used
7.3 The Solver
by the algorithm is proportional to the maximum
The solver helps fill in values of constants in per- number of overlapping intervals. Of course, the
formance specifications when the values are best time and space also depend on the size of the spec-
obtained by measurement. For example, the im- ification (the number of bytes and the maximum
plementors of a file system might expect the time depth of expressions). A more detailed description
for a Read operation to be the sum of a fixed over- of the implementation and an analysis of its time
head and a variable cost depending on the size of and space requirements can be found in the first
the read. The per-byte time and the fixed over- author’s dissertation [15].
head can be estimated by measuring the system In practice, we have observed that the current
for Reads of a variety of sizes. PSpec tools can process logs at the rate of about
The input to the solver is a monitoring log and a 4000=4500 events per second (about 8–9 seconds
performance specification including unknowns and per megabyte) on a DECstation 5000/200 (with
solve declarations. An unknown is a symbolic con- a MIPS R3000 processor) for a moderately com-
stant whose value is to be determined. A solve plex specification (one wit h several event types, an
declaration tells the solver how to estimate values interval type with an aggregate expression in its
for unknowns. One form of solve declaration in- metrics, and a couple of assertions with aggregate
structs the solver to use linear regression, with the expressions). Because log processing speed varies
data points computed from the log. depending on the particulars of a specification and
the percentage of events in a log that are relevant
The output of the solver is a revised specification
to the specification, these numbers are provided
with estimated values supplied for unknowns. This
only to give a sense of the log-processing speed.
specification can then be used (perhaps after fur-
The current tools interpret specifications (rather
ther modifications) to check other monitoring logs.
than compiling them) and are not highly optimized
The monitoring log given to the solver will typically
for memory usage (garbage collection has a non-
be from a run believed to produce good estimates
negligible overhead). Even so, we have found the
for the unknowns.
performance acceptable for the sizes of logs we have
been processing thus far (up to several megabytes).
7.4 Implementation We believe the performance of the tools can be im-
proved significantly if necessary.
The checker, evaluator, and solver tools use a com-
mon set of library routines for parsing and type-
checking performance specifications and evaluat - 8 Experience
ing expressions relative to monitoring logs. Ex-
pressions are evaluated by first evaluating all log- In this section we describe an experiment using an
aggregate expressions, and then evaluating other earlier version of PSpec to write and check per-
expressions in a specification. If log-aggregate ex- formance specifications for the runtime system of
pressions are nested, one pass through the log is Prelude, a new parallel programming language [18].
used per nesting level, starting with the innermost The experiment had two beneficial results. First, it
expressions. This strategy is simple and reasonably exposed limitations of the initial version of PSpec

142
and produced insights about how to generalize it,
timed event InterruptsOfl (pid);
resulting in the version described in this paper.
InterruptsOn (pid);
Second, we discovered several performance bugs in
interval IntDisabled =
the Prelude runtime system.
s: Interrupts Off,
Prelude is intended for writing portable and e: Interruption where e.pid = s.pid
modular parallel programs; it provides objects and metrics
threads, and allows a programmer to issue direc- time = ts(e) – ts(s),
tives that help the compiler to optimize a pro- end IntDisabled;
gram for a given architecture. At the time of the
assert
experiments, an initial version had been designed {count i : IntDisabled
and mostly implemented on top of Proteus, a high- where i.time > 75 eye) < 1
performance simulator for MIMD architectures [4].

Figure 4: An elapsed time specification for dis-


8.1 Prelude Specifications abling of interrupts

We wrote performance specifications for several


pieces of the Prelude runtime system. In some age of messages received that get forwarded be-
cases we also constructed test programs to exer- cause their target objects have migrated. These
cise the corresponding pieces of the system, gen- are both workload specifications in the sense that
erated monitoring data, and checked the specifica- they depend on the Prelude programs that are run.
tions. All of the specifications were written using
a version of the PSpec language that was less gen-
8.2 Prelude Performance Bugs
eral and expressive than the one described in this
paper, though the same performance bugs would As a result of checking some of the Prelude per-
have been found using the current language. formance specifications we found four performance
We wrote specifications for Prelude for both bugs in the Prelude runtime system. Three of the
elapsed time and workload properties. One of the bugs were found by checking the interrupt specifi-
elapsed time specifications, shown in Figure 4, ex- cation, and the other was found using the assertion
presses bounds on how long interrupts are disabled about time to allocate object identifiers.
on processors. An IntDisabled interval starts when We started by checking the specification in Fig-
interrupts are turned off on a processor and ends ure 4, using a monitoring log generated by a test
when interrupts are turned on again, The bound is program that used the Prelude runtime system.
expressed in the assertion that interrupts are dis- The checker reported an assertion failure. By ex-
abled for more than 75 cycles in at most one inter- amining the intervals in the log we discovered that
val (which represents a startup transient). some of them lasted thousands of cycles, while oth-
The other elapsed time specifications are simi- ers were only slightly longer than expected (about
lar. They express bounds on elapsed times for sev- 85-90 cycles).
eral operations: queueing messages (when neces- Taking advantage of the simulator on which the
sary) during sends; executing the fast path to cre- program was running, we were able to examine
ate a message; initializing certain data structures; the call stacks at the starts of long interrupt in-
allocating reply codes and object identifiers; per- tervals. The causes of the problems were obvious
forming null remote procedure calls; and executing once we looked at the code in each case. The very
threads to process incoming messages. long intervals were caused by a thread that sus-
We also wrote two workload specifications about pended itself after having disabled interrupts with-
the percentage of operations with particular prop- out first reenabling them. This resulted from a
erties. One of these expresses bounds on the per- misunderstanding among the implementors about
cent age of inter-processor messages that require the interaction between thread suspension and the
multiple packets, The other concerns the percent- interrupt flags. (Interrupts were reenabled eventu-

143
ally through another mechanism but the result was tools designed specifically for writing and check-
that they were disabled for much longer periods ing performance assertions. Other tools could also
of time than intended. ) The other long intervals, be designed to fit into the performance assertion
and the failed assertion about object identifiers, re- framework and augment the capabilities of our cur-
sulted from problems with allocation strategies—a rent tools. It would also be useful to integrate the
result of miscalculations and careless coding. The PSpec tools with other performance tools.
strategies were redesigned once the problems were
revealed. One advantage of the PSpec approach is that it
What can we conclude from these experiments? has a low startup cost. Performance assertions can
We were encouraged to confirm our hypothesis that be written for any piece of a system that can be

simple performance assertions are useful for iden- monitored, at any level of abstraction. There is
tifying performance bugs. The runtime code had no requirement that performance specifications be

been implemented several months earlier, and the complete, or that they be provided for all modules

implementors had designed and partially tuned for in a system, or that they be written in terms of the
performance, so we believe that the bugs we found interfaces to modules. Additional assertions and

were not trivial. On the other hand, the pieces of monitoring can be added incrementally, as needed.
the Prelude runtime system that were completed
had not yet seen extensive use by clients, and it is PSpec provides a systematic approach to per-
possible that the bugs would have shown up even- formance testing that can replace current ad hoc
tually under normal use. The problem with the in- approaches in many situations. Implementors have
terrupt disabling might have been hard to identify, always had the ability to generate monitoring logs
however, if the only visible effect was Prelude pro- and process them to check properties of executions,
grams that ran more slowly than expected. Over- but they seldom do so because it is time-consuming
all, the benefit of writing and checking performance to write the log processing programs and to set up
assertions far outweighed the cost. the monitoring. Also, when such special-purpose
instrumentation is produced for the performance
testing phase of program development, it tends to
9 Summary and Conclusions
become obsolete over time, In addition, special
purpose tools developed for one system may not
Precise performance assertions can help auto-
be useful for other systems. With PSpec’s general-
mate performance regression testing and continu-
purpose tools the performance testing phase can
ous monitoring of systems, and thereby help find
continue throughout a system’s lifetime.
performance bugs. The PSpec language and tools
are a realization of this idea. In comparison to
We began with the hypothesis that relatively
other systems that have attempted to automate
simple performance assertions would be useful for
monitoring and testing, P Spec is simpler and more
finding performance bugs. One of the reasons that
efficient, yet can express a wide range of perfor-
so many systems perform badly is that no at-
mance properties.
tempt is made to study performance, or perfor-
A monitoring log provides a simple interface mance studies are conducted only once. Tools that
between the assertion language and the program take advantage oft he knowledge that programmers
whose performance is being described. It abstracts already have, and that facilitate performance stud-
away the idiosyncrasies of the system being moni- ies that continue throughout the lifetime of a sys-
tored to capture those facts about executions that tem, would thus be valuable. Preliminary experi-
are relevant for performance assertions. The exis- ence with PSpec has supported our hypothesis: the
tence of this interface permits a single implemen- assertions that we used to find the Prelude perfor-
tation of a set of tools for processing performance mance bugs were simple and obvious. This bodes
assertions; the tools can be used with monitoring well for the prospect of programmers actually us-
logs from many different systems. ing the PSpec tools to produce systems with better
The PSpec checker, evaluator, and solver are performance than we see today.

144
References [10] Brian W. Kernighan and Dennis M. Ritchie.
The C Programming Language. Prentice Hall
[1] Alfred V. Aho, Brian W. Kernighan, and Pe- Software Series. Prentice Hall, Englewood
ter J, Weinberger. The A WK Programming Cliffs, New Jersey, 2nd edition, 1988.
Language. Series in Computer Science, Addi-
[11] Ted Lehr, Zary Segall, Dalibor Vrsalovic, Ed-
son Wesley, 1989.
die CapIan, Alan L. Chung, and Charles E.
[2] Thomas E. Anderson and Edward D. La- Fineman. Visualizing performance debugging.
zowska. Quartz: A tool for tuning parallel IEEE Computer, 22(10):38-51, October 1989.
program performance. In Proceedings of the
[12] Yingsha Liao and Donald Cohen. A spec-
Conference on Measurement and Modeling of
ificational approach to high level program
Computer Systems, SIGMETRICS ’90, May
monitoring and measuring. IEEE Transac-
1990.
tions on Software Engineering, 18(11 ):969–

[3] David Bernstein, Anthony Bolmarcich, and 978, November 1992.

Kimming So, Performance visualization of


[13] Allen D. Malony, David H. Hammerslag, and
parallel programs on a shared memory mul-
David J. Jablonowski. Traceview: A trace vi-
tiprocessor system. In Proceedings of the 1989
sualization tool. IEEE Software, 8(5):19–28,
International Conference on Parallel Process-
September 1991.
ing, volume II, pages 1–10, August 1989.
[14] Greg Nelson, editor. Systems Programming
[4] E.A. Brewer, C.N, Dellarocas, A. Colbrook,
With Modula-3. Prentice Hall Series in Inno-
and W.13. Weihl. PROTEUS: A high- vative Technology. Prentice Hall, Englewood
performance parallel-architecture simulator.
Cliffs, New Jersey, 1991.
Technical Report MIT/LCS/TR-516, MIT
Laboratory for Computer Science, September [15] Sharon E. PerL Performance assertion check-
19910 ing. Technical Report MIT/LCS/TR-551,
MIT Laboratory for Computer Science, Cam-
[5] D. Cohen. AP5 User’s Manual. ISI/USC, bridge, MA 02139, September 1992.
1988.
[16] Richard Snodgrass. A relational approach to
[6] Digital Equipment Corporation, pizie(l). Ul- monitoring complex systems. ACM Trans-
trix 4,0 General Information, Vol. 3B (Com- actions on Computer Systems, 6(2):157–196,
mands(l): M-Z). May 1988.

[7] Yogesh Gaur, Vincent A, Guarna Jr., and [17] Larry Wall and Randal L. Schwartz. Pro-
David Jablonowski. An environment for per- gramming Per/. O’Reilly and Associates, Se-
formance experiment ation on multiprocessors. bastopol, CA, 1991.
In Proceedings of SUPER COMPUTING ’89,
[18] W.E. Weihl, E.A. Brewer, A. Colbrook,
pages 589–594. IEEE Computer Society and
C.N. Dellarocas, W.C. Hsieh, A.D. Joseph,
ACM SIGARCH, November 1989.
C. WaJdspurger, and P. Wang. PRELUDE:
A system for building portable parallel soft-
[8] S. L. Graham, P.B. Kessler, and M,K. McKu-
sick. Gprofi A call graph execution profiler. ware. Technical Report MIT/LCs/TR.-5l9,

In Proceedings of the AC&l SIGPLAN Sympo- MIT Laboratory for Computer Science, Oc-

sium on Compiler Construction, June 1982. tober 1991.

[9] Michael T. Heath and Jennifer A. Etheridge.


Visualizing the performance of parallel pro-
grams. IEEE Sofiware, 8(5):29–39, September
1991.

145

You might also like