You are on page 1of 11

COMPUTER ARCHITECTURE Prof.

Basel Mahafzah Lesson 2 - The role of performance - Part 2

Outline

The Role of Performance


Part 2

Definition of Performance Clock Cycles per Instruction (CPI) Instruction Count

Cycles and Instructions


Since the compiler clearly generated instructions to execute, and the computer had to execute the instructions to run the program

Cycles and Instructions


The execution time must depend on the number of instructions in a program

Could we assume that # of cycles = # of instructions


2nd instruction instruction 2nd 1st instruction instruction 1st 3rd instruction instruction 3rd

4th 4th

5th 5th

6th 6th

Copyright Universit Telematica Internazionale UNINETTUNO

... ...

How many cycles are required for a program?

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

This assumption is incorrect Different instructions take different amounts of time on different machines Why? Hint: remember that these are machine instructions, not lines of C code

Different Numbers of Cycles for Different Instructions

Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers

Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)

Clock Cycles Per Instruction (CPI)

The number of clock cycles required for a program can be written as follows: CPU Clock Cycles = Instructions for a Program X Average Clock Cycles per Instruction

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

The term clock cycles per instruction, which is the average number of clock cycles each instruction takes to execute, is called CPI

Since different instructions may take different amounts of time depending on what they do, CPI is an average of all instructions executed in the program

CPI Example

Suppose we have two implementations of the same Instruction Set Architecture (ISA)

For some program, Computer A has a clock cycle time of 1 ns. and a CPI of 2.0 Computer B has a clock cycle time of 2 ns. and a CPI of 1.2

Which computer is faster for this program, and by how much?

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

If the two computers have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

Answer

We know that each computer executes the same number of instructions for the same program lets call this number I

First find the number of processor clock cycles for each computer: CPU clock cyclesA = I x 2.0 CPU clock cyclesB = I x 1.2

Now we can compute the CPU time for each computer: CPU timeA = CPU clock cyclesA x Clock cycle timeA = I x 2.0 x 1 ns = 2 x I ns CPU timeB = I x 1.2 x 2 ns = 2.4 x I ns Clearly, computer A is faster. The amount faster is given by the ratio of the execution times:

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

CPU PerformanceA / CPU PerformanceB = Execution TimeB / Execution TimeA = 2.4 x I ns / 2 x I ns = 1.2

Final Answer
Computer A is 1.2 times faster than Computer B for this program

Definition of Performance (Continues)

Instruction Count

Basic performance equation in terms of instruction count (the number of instructions executed by the program), CPI, and clock cycle time:

CPU time = Inst.Count CPI CC Time Or CPU time = Inst.Count CPI CR


CC = Clock Cycles CR = Clock Rate

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

Execution Time =
I CC S P I CC

How to find the performance parameters (such as CC)? CPU clock cycles =

Where I=Instructions P= Program CC= Clock cycles S= Seconds

CPI C
i=1 i

Where Ci is the count of the number of instructions of classi executed CPIi is the average number of cycles per instruction for that instruction class, and n is the number of instruction classes

Example
A compiler designer is trying to decide between two code sequences for a particular machine

Example Continues

Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively)

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C

Which sequence will be faster? What is the CPI for each sequence?

Answer
Sequence 1 executes 2 + 1 + 2 = 5 instructions Sequence 2 executes 4 + 1 + 1 = 6 instructions So, sequence 1 executes fewer instructions

Answer Continues

To find the total number of clock cycles for each sequence: CPU clock cycles =
n

CPU clock cycles1 = (2x1) + (1x2) + (2x3) = 10 cycles CPU clock cycles2 = (4x1) + (1x2) + (1x3) = 9 cycles

CPIi Ci
i=1

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

So code sequence 2 is faster, even though it actually executes one extra instruction

Since code sequence 2 takes fewer overall clock cycles but has more instructions, it must have a lower CPI

The CPI values can be computed by: CPI =

CPI1 = CPU clock cycles1 = 10 = 2 Instruction count1 5 CPI2 = CPU clock cycles2 = 9 = 1.5 Instruction count2 6

CPU CC Inst. Count

In Summary
Note: When comparing two machines, you must look at all three components (instruction count, CPI, and clock cycle time), which combine to form execution time

Now that we understand cycles

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

A given program will require some number of instructions (machine instructions) some number of cycles some number of seconds

We have a vocabulary that relates these quantities: Inst. Count (inst. per program) Clock Rate (cycles per second) Cycle Time (seconds per cycle)

CPI (cycles per instruction) A floating point intensive application might have a higher CPI

Performance

Performance is determined by execution time Do any of the other variables equal performance? number of cycles to execute program?

number of instructions in program? number of cycles per second? average number of cycles per instruction?

Copyright Universit Telematica Internazionale UNINETTUNO

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

average number of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isnt

Benchmarks

Performance best determined by running a real application Use programs typical of expected workload

Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc.

Small benchmarks nice for architects and designers easy to standardize can be abused

SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputs can still be abused

Copyright Universit Telematica Internazionale UNINETTUNO

10

COMPUTER ARCHITECTURE Prof. Basel Mahafzah Lesson 2 - The role of performance - Part 2

SPEC 89
800 800 700 700 600 600

Benchmarks are valuable indicator of performance (and compiler technology)

500 500 400 400 300 300 200 200 100 100 0 0

Compiler enhancements and performance

gcc gcc espresso spice doduc doduc nasa7 espresso spice nasa7 Benchmark Benchmark compiler compiler

li matrix fpppp fpppp tomca tomca li eqntott eqntott matrix 300 tv 300 tv Enhanced Enhanced compiler compiler

SPEC 95 (8 integer benchmarks)


Benchmark Description go Artificial intelligence; plays the game of Go m88ksim Motorola 88k chip simulator; runs test program gcc The Gnu C compiler generating SPARC code compress Compresses and decompresses file in memory li Lisp interpreter ijpeg Graphic compression and decompression perl Manipulates strings and prime numbers in the special-purpose programming language Perl vortex A database program
tomcatv tomcatv swim swim su2cor su2cor hydro2d hydro2d mgrid mgrid applu applu trub3d trub3d apsi apsi fpppp fpppp wave5 wave5

SPEC 95 (10 floating-point benchmarks)


Benchmark Benchmark Description Description

A A mesh mesh generation generation program program Shallow Shallow water water model model with with 513 513 x x 513 513 grid grid quantum quantum physics; physics; Monte Monte Carlo Carlo simulation simulation Astrophysics; Astrophysics; Hydrodynamic Hydrodynamic Naiver Naiver Stokes Stokes equations equations Multigrid Multigrid solver solver in in 3-D 3-D potential potential field field Parabolic/elliptic Parabolic/elliptic partial partial differential differential equations equations Simulates Simulates isotropic, isotropic, homogeneous homogeneous turbulence turbulence in in a a cube cube Solves Solves problems problems regarding regarding temperature, temperature, wind wind velocity, velocity, and and distribution distribution of of pollutant pollutant Quantum Quantum chemistry chemistry Plasma Plasma physics; physics; electromagnetic electromagnetic particle particle simulation simulation

SPECint 95 SPEC ratio: bigger numeric results indicate Faster performance


10 10 998 8 776 6 554 4 332 2 1 110 0

SPECfp 95 SPEC ratio: bigger numeric results indicate Faster performance


10 10 998 8 776 6 554 4 332 2 1 110 0

Pentium Pro is 1.4 to 1.5 times faster than Pentium

50 50

Clock rate (MHz)

100 100

150 150

200 200

250 250

Pentium Pro is 1.7 to 1.8 times faster than Pentium

50 50

100 100

150 150

200 200

250 250

Clock rate (MHz)

Copyright Universit Telematica Internazionale UNINETTUNO

11