Professional Documents
Culture Documents
There are many ways to measure the performance of a parallel processors, and system A gives a higher speed-
parallel algorithm running on a parallel processor. The up than system B for the same program, then you
most commonly used measurements are the elapsed would say that system A provides better support for
time, price/performance, the speed-up, and the effi- parallelizing this program than does system B.
ciency. This article defines another metric which re- An example of such support is the presence of more
veals aspects of the performance that are not easily processors. A Sequent with 30 processors will almost
discerned from the other metrics. certainly produce a higher speed-up than a Cray with
The elapsed time to run a particular job on a given only 8 processors.
machine is the most important metric. A Cray Y-MP/l The issue of efficiency is related to that of price/
solves the order 1,000 linear system in 2.17 seconds performance. It is usually defined as
compared to 445 seconds for a Sequent Balance 21000
with 30 processors [Z]. If you can afford the Cray and T(l)
e=pTo=p.
s
you spend most of your time factoring large matrices,
then you should buy a Cray.
Price/performance of a parallel system is simply the Efficiency close to unity means that you are using your
elapsed time for a program divided by the cost of the hardware effectively; low efficiency means that you are
machine that ran the job. It is important if there are a wasting resources. As a practical matter, you may buy a
group of machines that are “fast enough.” Given a fixed system with 100 processors that each takes 100 times
amount of money, it may be to your advantage to buy a longer than you are willing to wait to solve your prob-
number of slow machines rather than one fast machine. lem. If you can code your problem to run at high effi-
This is particularly true if you have many jobs to run ciency, you’ll be happy. Of course, if you have 200
and a limited budget. In the previous example, the processors, you may not be unhappy with 50 percent
Sequent Balance is a superior price/performer than efficiency, particularly if the 200 processors cost less
the Cray if it costs less than 0.5 percent as much. On than other machines that you can use.
the other hand, if you can’t wait 7 minutes for the Each of these metrics has disadvantages. In fact,
answer, the Sequent is not a good buy even if it wins there is important information that cannot be obtained
in price/performance. even by looking at all of them. It is obvious that adding
These two measurements are used to help you decide processors should reduce the elapsed time, but by how
what machine to buy. Once you have bought the ma- much? That is where speed-up comes in. Speed-up
chine, speed-up and efficiency are the measurements close to linear is good news, but how close to linear is
often used to let you know how effectively you are good enough? Well, efficiency will tell you how close
using it. you are getting to the best your hardware can do, but if
The speed-up is generally measured by running the your efficiency is not particularly high, why? The new
same program on a varying number of processors. The metric defined in the following section is intended to
speed-up is then the elapsed time needed by 1 proces- answer these questions.
sor divided by the time needed on p processors, s =
T(l)/T(p). (Of course, the correct time for the unipro-
cessor run would be the time for the best serial algo- NEW METRIC
rithm, but almost nobody bothers to write two pro- We will now derive our new metric, the experimen-
grams.) If you are interested in studying algorithms for tally determined serial fraction, and show why it is
useful. We will start with Amdahl’s Law [l] which in
0 1990 ACM 0001.0782/90/0500-0539 $1.50 its simplest form says that
Sequent 11111
Sequent 3b 445 25.; 0.833 0.007
Convex C-210 1 15 -
Convex C-220 2 7.98 1.88 0.940 0.064
Convex C-240 4 4.03 3.72 0.930 0.025
has less vector content, the fraction of the time spent in SCALED SPEED-UP
serial code is increased. All the analysis presented so far refers to problems of
The Linpack benchmark report measures the perfor- fixed size. Gustafson [4] has argued that this is not how
mance of a computational kernel running on machines parallel processors are used. He argues that users will
with no more than 30 processors. The results in Table II increase their problem size to keep the elapsed time of
are taken from the work of the winners of the Gordon the run more or less constant. As the problem size
Bell Award [5]. Three applications are shown with grows, we should find the fraction of the time spent
maximum speed-ups of 639, 519, and 351 and efficien- executing serial code decreases, leading us to predict a
cies ranging from 0.9965 to 0.3430. We know this is decrease in the measured serial fraction.
good work since they won the award, but how good If we assume that the serial time and overhead are
a job did they do? The serial fraction ranges from independent of problem size, neither of which is fully
9.00051 to 9.0019 indicating that they did a very good justified, [3]
job, indeed.
The serial fraction reveals an interesting point. On all T(p, k) = T, + ze (6)
three problems, there is a significant reduction in the P'
serial fraction in going from 4 to 16 processors (from 16 where T( p, k) is the time to run the program on p
to 64 for the wave motion problem). As with the Con- processors for a problem needing k times more arith-
vex results, these numbers indicate something akin to metic. Here k is the scaling factor and k = 1 when p = 1.
“superlinear” speed-up. Perhaps the 4-processor run Flatt [3] points out that the scaling factor k must count
sends longer messages than does the 16-processor run, arithmetic, not some more convenient measure such as
and these longer messages are too long for the system to memory size.
handle efficiently. At any rate, the serial fraction has Our definition of speed-up must now account for the
pointed up an inconsistency that needs further study. additional arithmetic that must be done to solve our
2;: 255.8
63.96 0.9994
0.9992 0.000015
0.0000038 0.00098
0.00096
1024 1023 0.9990 0.0000012 0.0013