You are on page 1of 33

Chapter # 1

Computer Abstractions and


Technology
Lecture # 2
Course Instructor: Dr. Afshan Jamil
Outline
• Defining performance
• Response time and throughput
• Relative performance
• Measuring execution time
• CPU clocking
• CPU time
• Determining clock cycles
• Instruction count and CPI
• Power trends
• Multiprocessor
• Reducing performance
• MIPS as a performance metric
Defining Performance
Understanding Performance
• Algorithm
– Determines number of operations executed.
• Programming language, compiler, architecture
– Determine number of machine instructions
executed per operation.
• Processor and memory system
– Determine how fast instructions are executed.
• I/O system (including OS)
– Determines how fast I/O operations are
executed.
Role of Computer Performance
• Designing high performance computers is one
of the major goals of any computer architect.
• Assessing the performance of computer
hardware is at the heart of computer design
and greatly affect the demand and market
value of the computer.
• However, measuring performance of a
computer system is not a straight-forward
task
CONTD…
• Which application to use to measure
performance? (Explore and submit as
assignment # 1)
• What component of computer to measure
(e.g., processor, I/O, cache)?
• How do other parameters affect performance
(e.g., OS, compiler).
• How do you define performance (e.g., faster, or
most completed jobs during a certain period of
time) response time or throughput.
Response Time and Throughput
• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…
• In this class, we will be
primarily interested in !
Execution time/response
time as a measure of
performance.
Relative Performance
• Define Performance = 1/Execution Time
• “X is n time faster than Y”

Performanc e X Performanc e Y
= Execution time Y Execution time X = n

Example: If computer A runs a program in 10 seconds


and computer B runs the same program in 15 seconds,
how much faster is A than B?
CONTD…
• Solution:
– Execution Time of A= 10s
– Execution Time of B=15s
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵 15
– = = = 1.5
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐵 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 10
– So, performance of A is 1.5 times
performance of B
Measuring Execution Time
• Elapsed time
– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– Comprises user CPU time and system CPU time
– In our performance measurements, we use user
CPU time - because of its independence on the OS
and other factors
CPU Clocking
• Operation of digital hardware governed by a constant-
rate clock Clock period

Clock (cycles)
Data transfer and computation
Update state

◼ Clock period: duration of a clock cycle


◼ e.g., 250ps = 0.25ns = 250×10
–12s

◼ Clock frequency (rate): cycles per second


◼ e.g., 4.0GHz = 4000MHz = 4.0×10 Hz
9
CPU Time
CPU Time = CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
• Performance improved by
– Reducing number of clock cycles
– Increasing clock rate
– Hardware designer must often trade off
clock rate against cycle count
CPU Time Example
• A given program runs in 10 sec on computer A, which
has a 100 MHz clock. We are trying to help a
computer designer build a computer B, that will run
this program in 6 sec. The designer has determined
that a substantial increase in the clock rate is possible,
but this increase will affect the rest of the CPU design,
causing computer B to require 1.2 times as many
clock cycles as computer A for this program. What
clock rate should we tell the designer to target
CONTD…
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock
cycles
• How fast must Computer B clock be?
solution
𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠𝐴
• 𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐴 =
𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒𝐴
𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝐴
• 10𝑠 =
2 𝐺𝐻𝑧
• 𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠𝐴 = 10 × 2 = 20
𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠𝐵
• 𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 =
𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒𝐵
1.2×𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠𝐴
• 𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 =
𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒𝐵
1.2×𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠
• 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒𝐵 =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵
1.2×20×109
• 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒𝐵 = = 4𝐺𝐻𝑧
6
Determining Clock Cycles
• So, what determines the number of cycles
required to execute an application?
• One possibility
• # cycles= # instructions

• However, this is NOT true because different


instructions take different amounts of time.
CONTD…

• A more realistic picture of what’s happening…

– Floating point operations can take longer than integer.


– Multiplication takes longer than addition.
– Memory accesses can take many cycles to complete.
• Clock cycles = Instructions * Avg Cycles Per Instruction
Instruction Count and CPI
Clock Cycles = Instructio n Count  Cycles per Instructio n
CPU Time = Instructio n Count  CPI Clock Cycle Time
Instructio n Count  CPI
=
Clock Rate
• Instruction Count for a program
– Determined by program, ISA and compiler
• Average cycles per instruction
– Determined by CPU hardware
– If different instructions have different CPI
• Average CPI affected by instruction mix
CONTD…

IC CPI Clock rate


Program X
Compiler X X
ISA X X
Microarchitecture X X
Technology X
CPI Example
• Computer A: Cycle Time = 250ps, CPI = 2.0
• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA
• Which is faster, and by how much?
A is faster…
CPU Time = Instructio n Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps
CPU Time = Instructio n Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps

B = I  600ps = 1.2
CPU Time
CPU Time I  500ps
…by this much
A
CPI in more detail
• If different instruction classes take different
numbers of cycles

• Weighted average CPI


CPI Example
• Alternative compiled code sequences using instructions
in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

• Which code sequence executes most instructions? Which is


faster? CPI for each sequence?
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles

= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3


= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Performance Summary
Instructions Clock cycles Seconds
CPU Time =  
Program Instruction Clock cycle
• Performance depends on
– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, Tc
Performance Example

• A given application written in JAVA runs for 15


seconds on a desktop processor. A new Java compiler
is released that requires only 0.6 times as many
instructions as the old compiler. Unfortunately, it
increases the CPI by 1.1. How fast can we expect the
application to run using this new compiler.
Solution
CPU Time=IC × CPI × Clock cycle time
A:
15 = 𝐼𝐶𝐴 × 𝐶𝑃𝐼𝐴 × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒𝐴
15
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒𝐴 =
𝐼𝐶𝐴 × 𝐶𝑃𝐼𝐴
B:
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 = 𝐼𝐶𝐵 × 𝐶𝑃𝐼𝐵 × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒𝐵
15
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 = (0.6 × 𝐼𝐶𝐴 ) × (1.1 × 𝐶𝑃𝐼𝐴 ) × = 9.9𝑠𝑒𝑐𝑜𝑛𝑑𝑠
𝐼𝐶𝐴 × 𝐶𝑃𝐼𝐴
Power trends
Reducing Performance
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction

• The power wall


– We can’t reduce voltage further
– We can’t remove more heat
• How else can we improve performance?
Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
• Hard to do
– Programming for performance
– Load balancing
– Optimizing communication and synchronization
MIPS as a performance metric
• MIPS: Millions of Instructions Per Second
• MIPS is an alternative metric for performance.
• CPI varies between programs on a given CPU
CONTD…

• Inversely proportional to execution time


– Bigger numbers indicate better performance
– Intuitive representation
• 3 significant problems with MIPS usage
– Doesn’t consider what the instructions actually do
– Varies by program; no single number for a machine
– Can vary inversely with performance
Example: MIPS Performance
• Given the following tables of instruction
counts (in billions) and CPI for each
instruction class, find the MIPS and
execution times on a 4GHz machine.
Class A B C
CPI 1 2 3
Sequence A B C
1 5 1 1
2 10 1 1

You might also like