You are on page 1of 10

8 Processor Design Metrics

Power

Performance Size

NRE cost

Figure 1.1 Design metric competition — improving one may worsen others.

Time-to-market and Ease of use are some of the metrics that affect the cost
and price. Sometimes, a cost-performance metric may be more important
than cost and performance separately.
3. Power Consumption Metrics: Metrics of this group measure the power
consumption of the system. These metrics are gaining importance in
many fields as battery powered mobile systems become prevalent and
energy conservation becomes more significant.
4. System Effectiveness Metrics: In many applications such as military
applications, how effective the system is in implementing its target is
more important than cost. Reliability, Maintainability, Serviceability,
design adequacy and flexibility are related to the metrics of this group.
5. Others: These are metrics that include those that may guide the designer
to select from many off-the-shelf components that can do the job. Ease
of use, software support, safety and the availability of second source
suppliers are some of the metrics of this group.

1.3 Performance Design Metrics

Performance of a system is a measure of how long a system takes to execute


a desired application. For example, in terms of performance, a computer user
cares about how long a digital computer takes to execute a programme. For
the user the computer is faster when the execution of his programme takes less
time. For the IT manager, a computer is faster when it completes more tasks
within a unit time. For the manager as well as for the user, the clock frequency
of the computer (instructions per second) is not the key issue; the architecture
1.3 Performance Design Metrics 9

of one computer may result in a faster programme execution even though it


has a lower clock frequency than another. We note here that, the main concern
of the user is to reduce the response time (or latency, or execution time), while
the main concern of the manager is how to increase the throughput.
With that said, there are several measures of performance. For simplicity,
suppose we have a single task that will be repeated over and over, such as
executing a programme or producing a car on an automobile assembly line.
The two main measures of performance are latency (or response time) and
throughput. However, we must note that throughput is not always just the
number of tasks times the latency. A system may be able to do better than this
by using parallelism, either by starting one task before completing the next
one (pipelining) or by processing tasks concurrently. In case of automobile
production factory, the assembly line consists of many steps. Each step con-
tributes something to completing the car. Each step operates in parallel with
the other steps, though on a different car. Thus, our assembly line may have
a latency of 4 hours but a throughput of 120 cars per day. The throughput in
our example here is defined as the number of cars produced per day. In other
words, the throughput is determined by how often a completed car exits the
production line.

Definitions:
Latency or Response time: The time between the start of the task’s execution
and the end. For example, producing one car takes 4 hours.
Throughput: The number of tasks that can be processed per unit time. For
example, an assembly line may be able to produce 6 cars per day.

The main concern in the two cases, throughput and response time, is time.
The computer that performs the same amount of work in the least time is the
fastest. If we are speaking of a single task, then we are speaking of response
time, while if we are speaking of executing many tasks, then we are speaking
about throughput. The latency metric is directly related to the execution time
while throughput measures the rate of implementing a given task. We can
expect many metrics measuring throughput based on the definition of the
task. The task may be an instruction as in case of MIPS (see 1.3.2.2), or it
may be floating-point operations as in case of MFLOPS (see 1.3.2.3) or any
other task. Besides execution time and rate metrics, there is a wide variety
of more specialised metrics used as indices of computer system performance.
Unfortunately, as we shall see later, many of these metrics are often used but
interpreted incorrectly.
10 Processor Design Metrics

Performance metrics can be divided into two classes;


a. Metrics that measure the performance of working systems: Latency
and throughput are examples of this group. We use them to measure the
performance of something already implemented irrespective of whether
such performance fulfils the design requirements of this metric or not.
Such performance metrics that measure what was ready done are called
means-based metrics.
b. Metrics that measure performance during the design process: i.e.
before implementing the system. This group of metrics, for example,
enables the designer to compare different algorithms against the expected
computation time. All the algorithms may achieve the same functionality
but it is necessary to decide which one is optimum from the point of
view of execution time and hardware complexity. As will be discussed in
Chapter 2, to reach an optimum system, the designer has to optimise some
of the design metrics at each stage of the design process; performance
at one stage, power consumption at another stage and so on. In the early
stages of the design process the target of the designer is to measure what
is actually achieved after each stage. The Big-O notation is an example
of such performance metrics.
This leads us to the following question: What are the characteristics of a good
performance metric?

1.3.1 Characteristics of a Good Performance Metric

There are many different metrics that have been used to describe the per-
formance of a computer system. Some of these metrics are commonly used
throughout the field, such as MIPS and MFLOPS (which are defined later in
this chapter), whereas others are introduced by manufacturers and/or designers
for new situations as they are needed. Experience has shown that not all of
these metrics are ‘good’ in the sense that sometimes using a particular metric
out of context can lead to erroneous or misleading conclusions. Consequently,
it is useful to understand the characteristics of a ‘good’ performance metric.
This understanding will help when deciding which of the existing perfor-
mance metrics to use for a particular situation and when developing a new
performance metric.
A performance metric that satisfies all of the following requirements is
generally useful to a performance analyst in allowing accurate and detailed
comparisons of different measurements. These criteria have been developed by
1.3 Performance Design Metrics 11

researchers through observing the results of numerous performance analyses


over many years. While they should not be considered absolute requirements
of a performance metric, it has been observed that using a metric that does
not satisfy these requirements can often lead the analyst to make erroneous
conclusions.
1. Linearity: The metric is linear if its value is proportional to the actual
performance of the machine. That is, if the value of a metric changes by
a certain ratio, the actual performance of the machine should change by
the same ratio. For example, suppose that you are upgrading your system
to a system whose speed metric (i.e. execution-rate metric) is twice as
large as on your current system. You then would expect the new system
to be able to run your application programmes in half the time taken by
your old system. Similarly, if the metric for the new system were three
times larger than that of your current system, you would expect to see
the execution times reduced to one-third of the original values. Not all
types of metrics satisfy this proportionality requirement e.g. logarithmic
metrics are nonlinear metrics.
2. Reliability: A performance metric is considered to be reliable if system
A always outperforms system B when the corresponding values of the
metric for both systems indicate that system A should outperform system
B. For example, suppose that we have developed a new performance met-
ric called WPAM that we have designed to compare the performance of
computer systems when running the class of word-processing application
programmes. We measure system A and find that it has a WPAM rating
of 128, while system B has a WPAM rating of 97. We then can say that
WPAM is a reliable performance metric for word-processing application
programmes if system A always outperforms system B when executing
these types of applications. While this requirement would seem to be so
obvious as to be unnecessary to state explicitly, several commonly used
performance metrics do not in fact satisfy this requirement. The MIPS
metric, for instance, which is described latter, is notoriously unreliable.
Specifically, it is not unusual for one processor to have a higher MIPS rat-
ing than another processor while the second processor actually executes
a specific programme in less time than does the processor with the higher
value of the metric. Such a metric is essentially useless for summarizing
performance, and we say that it is unreliable.
3. Repeatability: A performance metric is repeatable if the same value of
the metric is measured each time the same experiment is performed. Note
that this also implies that a good metric is deterministic.
12 Processor Design Metrics

4. Ease of measurement: If a metric is not easy to measure, it is unlikely


that anyone will actually use it. Furthermore, the more difficult a metric
is to measure directly, or to derive from other measured values, the more
likely it is that the metric will be determined incorrectly. The only thing
worse than a bad metric is a metric whose value is measured incorrectly.
5. Consistency: A consistent performance metric is one for which the units
of the metric and its precise definition are the same across different
systems and different configurations of the same system. If the units of
a metric are not consistent, it is impossible to use the metric to compare
the performances of the different systems. While the necessity for this
characteristic would also seem obvious, it is not satisfied by many popular
metrics, such as MIPS (Section 1.3.2.2) and MFLOPS (Section 1.3.2.3).
6. Independence: Many purchasers of computer systems decide which sys-
tem to buy by comparing the values of some commonly used performance
metric. As a result, there is a great deal of pressure on manufacturers to
design their machines to optimise the value obtained for that particular
metric and to influence the composition of the metric to their benefit. To
prevent corruption of its meaning, a good metric should be independent
of such outside influences.

Many metrics have been proposed to measure performance. The following


sections describe some of the most commonly used performance metrics and
evaluate them against the above characteristics of a good performance metric.

1.3.2 Some Popular Performance Metrics

Many measures have been devised in an attempt to create standard and easy-to-
use measures of computer performance. One consequence has been that simple
metrics, valid only in a limited context, have been heavily misused such that
using them normally results in misleading conclusions, distorted results and
incorrect interpretations. Clock rate, MIPS and MFLOPS are the best examples
of such simple performance metrics; using any of them results in misleading
and sometimes incorrect conclusions. These three metrics belong to the same
family of performance metrics that measure performance by calculating the
rate of occurrence of an event. In Section 1.3.2.8 we give an example that
highlights the danger of using the wrong metric (mainly the use of means-
based metrics or using the rate as a measure) to reach a conclusion about
computer performance. In most cases it is better to use metrics that use the
execution time as a base for measuring the performance.
1.3 Performance Design Metrics 13

1.3.2.1 Clock Rate


The clock rate is the best example of how a simple metric of performance
can lead to very wrong conclusions. The clock frequency of the central pro-
cessor is used by many suppliers as an indication of performance. This may
give one the impression that using a 1.5 GHz system would result in a 50%
higher throughput than a 1.0 GHz system. This measure however ignores how
much work the processor achieves in each clock cycle; that some complex
instructions may need many clock cycles and that I/O instructions may take
many clock cycles. More importantly, they ignore the fact that in many cases
the processor may not be the performance bottleneck; it may be the memory
subsystem or the data bus width.
Some of the characteristics of the clock rate metric are:
• it is repeatable since it is a constant for a given system
• it is easy to determine since it is most likely stamped on the box
• it is consistent across all systems
• it is difficult for a manufacturer to misrepresent it
The above characteristics make using the clock rate as a measure of perfor-
mance appear advantageous but as it is a nonlinear measure (doubling the clock
rate does not always mean doubling the resulting performance) it is therefore,
an unreliable metric. This point will be clarified later when considering the
execution time equation. Thus, we conclude that the processor’s clock rate is
not a good metric of performance.

1.3.2.2 MIPS (Mega Instruction per Second)


The MIPS metric belongs to a group of metrics called ‘rate metrics’; an
execution-rate performance metric. It is regularly used to compare the speed
of different computer systems and it measures the number of CPU instructions
performed per unit time.
MIPS = Instruction count/(Execution time * 106 )
This definition shows MIPS to be an instruction execution rate metric;
specifying performance inversely to execution time. It is important to note
here that both the instruction count (IC) and the execution time are measurable,
i.e., for a given programme (normally a benchmark), it is possible to calculate
MIPS and accordingly to get an idea of the performance. It is possible to
rewrite the definition of MIPS to include other measurable parameters:

MIPS = Instruction count/(Execution time ∗ 106 )


= Instruction count/(CPU clocks ∗ Cycle time ∗ 106 )
14 Processor Design Metrics

= (Instruction count ∗ clock rate)/(CPU clocks ∗ 106 )


= (Instruction count ∗ Clock rate)/(Instruction count
∗ CPI ∗ 106 )
= Clock rate/(CPI ∗ 106 ) (1.1)
In the above equation:
CPU clocks = the total number of clock cycles used to execute
the programme, and
CPI = the average cycles per instruction.
Knowing MIPS, it is possible to calculate the execution time:
Execution time = Instruction count/(MIPS ∗ 106 ) (1.2)

Problems that may arise of using MIPS as measure of performance:


Taking the instruction as the unit of counting makes MIPS easy to measure,
repeatable, and independent; however MIPS suffers from many problems that
make it a poor measure of performance. It is not linear, since a doubling of the
MIPS rate does not necessarily cause a doubling of the resulting performance
and it is neither reliable nor consistent since it does not correlate well to
performance at all.
The problem with MIPS as a performance metric is that different pro-
cessors can do substantially different amounts of computation with a single
instruction. For instance, one processor may have a branch instruction that
branches after checking the state of a specified condition code bit. Another
processor may have a branch instruction that first decrements a specified count
register and then branches after comparing the resulting value in the register
with zero. In the first case the single instruction does one simple operation
whereas in the second case, one instruction performs several operations. The
failing of the MIPS metric is that each instruction counts as one unit, irrespec-
tive of the work performed per instruction. This difference is the core of the
difference between RISC and CISC processors and this renders MIPS essen-
tially useless as a performance metric. Even when comparing two processors
of the same architecture, this metric can be inconsistent when instruction rate
changes and may not carry through to all aspects of a processor performance
such as I/O or interrupt latency.
As a matter of fact, sometimes MIPS can vary inversely with performance.
One of the examples that can clearly shows this fact is the case, when we
1.3 Performance Design Metrics 15

calculate MIPS for two computers with the same instruction set but one of them
has special hardware to execute floating-point operations and another machine
using software routines to execute the floating-point operations. The floating-
point hardware needs more clock cycles to implement one floating point
operation compared with the number of clock cycles needed to implement
an integer operation. This increases the average value of the CPI (cycles per
instruction) of the machine which in turn, according to equation (1.1), results
in a lower MIPS rating. On the other hand however, the software routines
that were needed to execute floating point operation consisted of many simple
instructions, now being replaced by a single hardware instruction and thus
executing much faster. Hence, the inclusion of floating point hardware will
result in a machine that has a lower MIPS rating but can do more work, thus
highlighting the drawback of MIPS as a metric. Example 1.4 further illustrates
this effect.

1.3.2.3 FLOPS (Floating-Point Operation per Second)


The primary shortcoming of the MIPS metric is due to counting instructions
irrespective of the amount of work the instruction does. MIPS metrics counts
a NOP instruction equal to a multiplication instruction that operates on data
at memory; both are counted as one instruction. Some applications contain
intensive processing of floating-point operations and MIPS leads to very wrong
conclusions in such cases. To account for the floating point execution which
may be critical in some applications, the FLOP (FLoating point OPeration)
metric was generated which counts the number of floating point operations
per second. This metric is impossible to measure directly as it only measures
floating point instructions which are almost impossible to use on their own;
they always need other instruction classes around them. However the metric
is easy to determine from the machine hardware specifications and can be
measured approximately using code sequences that are predominantly floating
point.
MFLOPS, (Mega FLOPS) as MIPS, can be calculated for a specific
programme running on a specific computer. MFLOPS is a measure of
millions of floating point-operation per second:

MFLOPS = Number of floating-point operations/


(Execution time ∗ 106 ) (1.3)
The metric considers floating-point operations such as addition, subtraction,
multiplication, or division applied to numbers represented by a single or double
16 Processor Design Metrics

precision floating-point representation. Such data items are heavily used in


scientific calculations and are specified in programming languages using key
words like float, real, double, or double precision.
MFLOPS is not a dependable measure as it depends on the type of
floating-point operations present in the processor (availability of floating point
instructions). For example some computers have no sine instruction while
others have. In the first case, the calculation of a sine function needs to call
the software sine routine that would perform several floating-point operations,
while the second case would require only one operation. Another weakness
of MFLOPS comes from treating floating point addition, subtraction, multi-
plication and division equally. This is incorrect as floating point division is
significantly more complex and time consuming than floating point addition.

1.3.2.4 SPECS (System Performance Evaluation Cooperative)


To avoid the shortages of the simple performance metrics mentioned above,
several computer manufacturers formed the System Performance Evaluation
Cooperative (SPEC) to produce a standardized performance metric. To reach
the target, they choose benchmark programmes that contained both integer and
floating-point operations, which reflected common workstation usage. They
also defined, as follows, a standard methodology for measuring and reporting
the performance obtained when executing these programmes:
• Measure the time required to execute each benchmark programme on the
system being tested.
• Divide the time measured for each programme by the time required to
execute the same programme on a standard basis machine in order to
normalise the execution times.
• Average these normalised values using the geometric mean to produce a
single performance metric.
While the SPEC methodology is certainly more rigorous than using MIPS
or MFLOPS as a measure of performance, it still gives a problematic perfor-
mance metric. One of its shortcomings is that averaging together the individual
normalized results with the geometric mean produces a metric that is not lin-
early related to a program’s actual execution time. Thus, the SPEC metric is not
intuitive. Furthermore, and more importantly, it has been shown to be an unre-
liable metric in that a given programme may execute faster on a system that has
a lower SPEC rating than it does on a competing system with a higher rating.
Finally, although the defined methodology appears to make the metric
independent of outside influences, it is actually subject to a wide range of
1.3 Performance Design Metrics 17

tinkering. For example, many compiler developers have used these bench-
marks as practice programmes, thereby tuning their optimisations to the
characteristics of this collection of applications. As a result, the execution
times of the collection of programmes in the SPEC suite can be quite sensitive
to the particular selection of optimisation flags chosen when the programme is
compiled. Also, the selection of specific programmes that comprise the SPEC
suite is determined by a committee of representatives from the manufacturers
within the cooperative. This committee is subject to numerous outside pres-
sures since each manufacturer has a strong interest in advocating application
programmes that will perform well on their machines. Thus, while SPEC is
a significant step in the right direction towards defining a good performance
metric, it still falls short of the goal.

1.3.2.5 Comments
As mentioned before any performance metric must be reliable. The majority
of the above mentioned metrics are not reliable. The main reason that makes
them unreliable is that they measure what was done whether or not it was
useful. Such metrics are called means-based metrics. The use of such metrics
may lead to wrong conclusions concerning the performance of the system.
To avoid such problems, we must use metrics that are based on the def-
inition of performance, i.e. the execution time. Such metrics are ends-based
metrics and measure what is actually accomplished. The difference between
the two classes of performance metrics is highlighted in Section 1.3.2.8.

1.3.2.6 The processor performance equation


The performance of any processor is measured, as mentioned above, by the
time required to implement a given task, in our case a given programme. The
execution time of a programme is normally referred to as CPU time. It is
expressed as follows:

CPU time = CPU clock cycles for a programme ∗ Clock cycle time

For any processor the clock cycle time (or clock rate) is known and it is
possible to measure the CPU clock cycles. CPU time can also be expressed
in terms of number of instruction executed (called instruction count IC), and
the average number of clock cycles per instruction (CPI):

CPU time = IC ∗ CPI ∗ Clock cycle time (1.4)

You might also like