You are on page 1of 38

Lecture 3

Understanding & Measuring


Performance
REFERENCE: DAVID A. PATTERSON & JOHN L.
HENNESSY – COMPUTER ORGANIZATION AND DESIGN
Introduction
25

 Hardware performance is often key to the effectiveness of


an entire system of hardware and software.
 For different types of applications, different performance
metrics may be appropriate, and different aspects of a
computer systems may be the most significant factor in
determining overall performance.
 Understanding how best to measure performance and
limitations of performance is important when selecting a
computer system
 To understand the issues of assessing performance.
 Why a piece of software performs as it does?
 Why one instruction set can be implemented to perform better than another?
 How some hardware feature affects performance?
Why measure performance?
26

 Performance is important!

 Identify HW/SW performance problems

 Comparisons:
 Which machine is faster?

 Which ISA is better?

 Which implementation (of an ISA) is faster?

 Expose significant performance issues (enable us to


ignore unimportant issues)
More than one way to measure
performance
27

 Performance is evaluated differently by different entity.


 Better performance means faster processing speed (e.g.
faster completion of a task/job)
 Better performance means higher throughput (doing
more jobs in a time given)
 Better performance means doing more jobs at a smaller
cost
Which plane has better performance?
28

If better performance If higher throughput (transporting


means having a long more passengers) is better
range performance If higher speed
is better
performance
Understanding terminology
29

 Execution time (a.k.a response time) :The total time it


takes from start to completion of a task
 Throughput :The total amount of tasks completed in a
given time interval
 CPU execution time (a.k.a CPU time) :The actual time
CPU spends on a specific task
 User CPU time: time the CPU spends on running the actual program
 System CPU time: time the CPU spends on OS overhead on behalf of
the program
 Clock cycle (a.k.a ticks, cycle) :Discrete time intervals
(the processor clock which runs at a constant rate). Usually
in nanoseconds (ns) or picoseconds (ps)
Clock Rate
Understanding terminology
30

 Clock period (a.k.a clock cycle time): the duration of one


clock cycle. In sec, or msec

 Clock rate (or frequency) : the speed that the


microprocessor executes each instruction or each vibration
of the clock. In MHz/GHz.
 Frequency = 1/clock period
 1 MHz representing 1 million cycles per second,
 1 GHz representing 1 thousand million cycles per second (109)

 Clock cycles per instruction (CPI) : The average


number of clock cycles each instruction takes to execute
Figure 1
1 cycle time =
how length of
this clock cycle

Figure 2

31
Common performance metrics
32

 MB/s, Mb/s: Megabytes, Megabits Per Second

 MIPS: Millions of Instructions Per Second

 CPI: Clock Cycles Per Instruction

 IPC: Instructions Per Clock cycle

 Hz: (processor clock frequency) cycles Per Second

 LIPS: Logical Interference Per Second

 FLOPS: Floating-Point arithmetic Operations Per Second


Computer performance measures
33

 Performance is related to execution time.

 To maximize performance, we want to minimize the execution time


 If performance of Computer A is 10 times better than Computer B, what
is the relation between their execution times?
This shows that CompB
needs 10x more time than
CompA to execute a given
task.
CPU Execution Time
34

Clock period = 1 Clock rate = frequency (Hz)


frequency
“the frequency at which a CPU is
If a processor has frequency, 320 MHz: running. It is measured in Hz unit”
Clock period = 1 = 3.125ns
320 000 000
Example 1: Improving Performance
35

 Our favorite program runs in 10 seconds on computer A,


which has a 4 GHz clock. Computer B will run this program
in 6 seconds, given that computer B requires 1.2 times as
many clock cycles as computer A for this program. What is
computer B’s clock rate?
What do we know?

Computer A Computer B

CPU Execution Time = 10s CPU Execution Time = 6s

Clock cycle (CC) = 1.2 x clock cycle


Clock rate (CR) = 4GHz = 4 x 109 Hz
Computer A
Example 1: Improving Performance
36
What do we know?

Computer A Computer B

CPU Execution Time = 10s CPU Execution Time = 6s

Clock cycle (CC) = 1.2 x clock cycle


Clock rate (CR) = 4GHz = 4 x 109 Hz
Computer A
Example 1: Improving Performance
35

 Our favorite program runs in 10 seconds on computer A,


which has a 4 GHz clock. Computer B will run this program
in 6 seconds, given that computer B requires 1.2 times as
many clock cycles as computer A for this program. What is
computer B’s clock rate?

 Answer: 8Ghz
Clock Cycles per Instruction (CPI)
37

 Previously, our calculations of Execution time did


not include the number of instructions needed for
the program.
 Different instructions may take different amounts of
time to execute, depending on what they do
 Example: The MOV (Move) instruction – moving
data from one place to another
The MOV instruction : Analogy
38

 Analyze Conrad’s movement of putting the red balls


into the container. To do 5 movements takes
longer to execute than 3

Balls from prime storage:


-walk
Balls from sub storage:
-fetch ball
-fetch ball
-walk (halfway)
-walk
-walk
-put ball in container
-put ball in container
➔ Total = 3
➔ Total = 5
CPU Execution Time
39

a.k.a Instruction count

CPU clock cycle


Example 2: Using Performance Equation
40

 Suppose we have two implementations of the same


instruction set architecture (ISA) and for the same
program. Which computer is faster and by how much?
 Computer A: clock cycle time=250 ps and CPI=2.0
 Computer B: clock cycle time=500 ps and CPI=1.2

 Note: because both computer uses the same program,


and the Instruction Count is not given, we can assume it
to be a variable I
Remember the formula
Example 2: Using Performance Equation
41

Remember: the lower the


execution time, the better the
performance.
Computer A is faster

How much faster is Computer A?


Example 2 (continued)…
42

We can conclude, A is 1.2


times faster than B for
this program
Measuring the CPI
43

 Sometimes it is possible to compute the CPU clock cycles by


looking at the different types of instructions and using their
individual clock cycle counts

 Ci = count of the number of instructions of class i executed


 CPIi = average number of cycles per instruction for that instruction class
 n = number of instruction classes

 Remember that overall CPI for a program will depend on both the
number of cycles for each instruction type and the frequency of each
instruction type in the program execution
Sample: Calculate CPI
44

 You are on the design team for a new processor. The clock of the processor runs
at 200 MHz. The following table gives instruction frequencies for Benchmark
B, as well as how many cycles the instructions take, for the different classes of
instructions. For this problem, we assume that (unlike many of today's
computers) the processor only executes one instruction at a time.
Instruction Type Frequency Cycles
Loads & Stores 30% 6 cycles
Arithmetic Instructions 50% 4 cycles
All Others 20% 3 cycles

If we say that there are 100 instructions, then:


➢ 30 of them will be loads and stores.

➢ 50 of them will be arithmetic instructions.

➢ 20 of them will be all others.

Formula: (30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions


= 4.4 cycles per instruction
Factors Affecting the CPU Performance
45
46
Example 3 : Comparing Code Segments
47

 A compiler designer is trying to decide between two code sequences for a particular
computer. The hardware designers have supplied the following facts:

 For a particular high-level-language statement, the compiler writer is considering


two code sequence that require the following instruction counts:
Example: code segments

a) Which code sequence executes the most instructions?


b) Which will be faster?
c) What is the CPI for each sequence?
Example 3 : Part (a)
48

Sequence 1 executes 2 + 1+ 2 = 5 Instructions


Sequence 2 executes 4 + 1+ 1 = 6 Instructions

Seq 2 executes THE MOST instructions


Example 3 : Part (b)
49

Using this equation

Takes 10 cycles to execute


5 instructions

Takes 9 cycles to execute


6 instructions
Seq 2 is FASTER
Example 3 : Part (c)
50

Code SEQ2 uses fewer


clock cycles, it must
have a lower CPI
Example 4 : Comparing Code Segments
51 Recall

 A processor has 3 classes of instructions:

Instruction CPI Code Code Clock cycles Clock cycles


SEQ1 SEQ2 SEQ1 SEQ2
A 1 5 3 5 3
B 2 3 2 6 4
C 5 1 2 5 10
9 ins. 7 ins. 16 clock 17 clock
cycles cycles
 Which code sequence is faster?

Code SEQ1 ➔ Takes 16 cycles Code SEQ2 ➔ Takes 17 cycles


to execute 9 instructions to execute 7 instructions

Code SEQ1 is FASTER


Example 4a: Calculating with CPI
52

 The ADD instruction takes 1 clock cycle to execute, while


the MUL instruction takes 3 clock cycles. If a program
consists of 20 ADD and 10 MUL instructions, what is the
average CPI?

What do we know?

Clock Instruction
Instruction
cycles count There are 2
instructions
ADD 1 20

MUL 3 10
Example 4a: Calculating average CPI
53
Instruction Clock Instruction
cycles count

ADD 1 20

MUL 3 10

0
0
ACTIVITY
55

CPU X runs a program/code sequence Y which consists of 100


instructions. Calculate and fill in the table below:
a) The CPI for each instruction class given below.
b) The execution time for each instruction class, given a clock
cycle time is 0.25miliseconds.
c) The CPU X’s execution time
d) The CPU X’s clock rate
Instruction Instructions Clock a) CPI b) Execution time
count Cycles
A 20 3
B 25 1
C 10 2
D 30 2
E 10 3
F 5 4
Increasing the CPU Performance
58

 Decreasing the clock cycle time


 Datapath organization leading to lower CPI
 Reduction in the number of executed instructions.
Example 5: Improve Performance
59 What do we know?
 Our favourite program runs in 20 seconds on Computer P,
which has 8 GHz clock. We are trying to help a computer
designer build Computer Q that will run this program in 5
seconds. The designer has determined that the substantial
increase in the clock rate is possible, but this will affect the
rest of the CPU design, causing computer Q to require 1.5
times as many clock cycles as computer P for this program.
What clock rate should we tell the designer to target?
Computer P Computer Q

CPU Execution Time = 20s CPU Execution Time = 5s


Clock rate (CR) = 8GHz = 8 x 109 Clock cycle (CC) = 1.5 x clock cycle
Hz Computer P
What do we know?
Computer P Computer Q
60
CPU Execution Time = 20s CPU Execution Time = 5s

Clock rate (CR) = 8GHz = 8 x 109 Clock cycle (CC) = 1.5 x clock cycle
Hz Computer P
Understanding the Units
64

 CPU execution time for a program = Seconds for the


program (S/P)
 Clock cycle = clock cycles per program (C/P)
 Clock cycle time = Seconds per clock cycle (S/C)
 Clock rate = clock cycle per second (C/S)
 Instruction count = Instructions executed for the program
(I/P)
 Clock cycle per instruction = Average number of clock
cycles per instructions (C/I)
Understanding the Units
65

It cancels each other to


give the unit.

Example:
10s = 20cycle/ clock rate
Clock rate = 20/10 cycle per seconds = 2Hz

1 Hz is 1 cycle per second

You might also like