You are on page 1of 28

12Z602 - Computer Architecture II

•Dependability
•Performance
•Quantitative principles of Computer
Design
Define and quantify dependability
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their service would
be dependable
• Systems alternate between 2 states of service with
respect to an SLA:
1. Service accomplishment, where the service is delivered
as specified in SLA
2. Service interruption, where the delivered service is
different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1

7/23/2018 2
Define and quantify dependability
• Module reliability measure of continuous service
accomplishment
2 metrics
1. Mean Time To Failure (MTTF) measures Reliability
2. Failures In Time (FIT) = 1/MTTF, the rate of failures
• Traditionally reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service
Interruption
– Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate
between the 2 states of accomplishment and
interruption (number between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR)

7/23/2018 3
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for disk subsystem
with 10 disks (1M hour MTTF per disk), 1 disk
controller (0.5M hour MTTF), and 1 power
supply (0.2M hour MTTF):
FailureRat e 

MTTF 

7/23/2018 4
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for 10 disks (1M hour
MTTF per disk), 1 disk controller (0.5M hour
MTTF), and 1 power supply (0.2M hour MTTF):
FailureRat e  10  (1 / 1,000,000)  1 / 500,000  1 / 200,000
 10  2  5 / 1,000,000
 17 / 1,000,000
 17,000 FIT
MTTF  1,000,000,000 / 17,000
 59,000hours
7/23/2018 5
How to measure performance of Computer?

7/23/2018 6
Performance: What to measure?
• Typical performance metrics:
– Response time
– Throughput
• X is n times faster than Y
– Execution timeY / Execution timeX

• Execution time
– Wall clock time: includes all system overheads
– CPU time: only computation time

• Benchmarks
– Kernels (e.g. matrix multiply)
– Toy programs (e.g. sorting)
– Synthetic benchmarks (e.g. Dhrystone)
– Benchmark suites (e.g. SPEC06fp, TPC-C)
7/23/2018 7
Performance: How to measure?
• SPECCPU: popular desktop benchmark suite
– CPU only, split between integer and floating point programs
– SPECSFS (NFS file server) and SPECWeb (WebServer) added as
server benchmarks

• Transaction Processing Council measures server


performance and cost-performance for databases
– TPC-C Complex query for Online Transaction Processing
– TPC-H models ad hoc decision support
– TPC-W a transactional web benchmark
– TPC-App application server and web services benchmark

7/23/2018 9
How Summarize Suite Performance?
• Arithmetic average of execution time of all pgms?
– But they vary by 4X in speed, so some would be more important
than others in arithmetic average
• Could add a weights per program, but how pick
weight?
– Different companies want different weights for their products
• SPECRatio: Normalize execution times to reference
computer, yielding a ratio proportional to
performance =
time on reference computer
time on computer being rated

7/23/2018 10
How Summarize Suite Performance
• If program SPECRatio on Computer A is 1.25
times bigger than Computer B, then
ExecutionTimereference
SPECRatio A ExecutionTime A
1.25  
SPECRatioB ExecutionTimereference
ExecutionTimeB
ExecutionTimeB Performance A
 
ExecutionTime A PerformanceB
• Note that when comparing 2 computers as a ratio,
execution times on the reference computer drop
out, so choice of reference computer is irrelevant
7/23/2018 11
How Summarize Suite Performance
• Since ratios, proper mean is geometric mean
(SPECRatio unitless, so arithmetic mean meaningless)

n
GeometricMean  n  SPECRatio
i 1
i

1. Geometric mean of the ratios is the same as the


ratio of the geometric means
2. Ratio of geometric means
= Geometric mean of performance ratios
 choice of reference computer is irrelevant!
• These two points make geometric mean of ratios
attractive to summarize performance
7/23/2018 12
Quantitative Principles of Computer
Design
Outline
• Quantitative Principles of Computer Design
1. Take Advantage of Parallelism
2. Principle of Locality
3. Focus on the Common Case
4. The Processor Performance Equation

7/23/2018 14
1) Taking Advantage of Parallelism

• Increasing throughput of server computer via


multiple processors or multiple disks

• Pipelining: overlap instruction execution to reduce


the total time to complete an instruction sequence.
– Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)

7/23/2018 15
Pipelined Instruction Execution
Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7


I

ALU
n Ifetch Reg DMem Reg

s
t
r.

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg

7/23/2018 16
2) The Principle of Locality

• Two Different Types of Locality:


– Temporal Locality (Locality in Time): If an item is referenced, it will
tend to be referenced again soon (e.g., loops, reuse)

– Spatial Locality (Locality in Space): If an item is referenced, items


whose addresses are close by tend to be referenced soon
(e.g., array access)

7/23/2018 17
3) Focus on the Common Case
• In making a design trade-off, favor the frequent
case over the infrequent case
– E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it 1st
– E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st
• Frequent case is often simpler and can be done
faster than the infrequent case
• What is frequent case and how much performance
improved by making case faster => Amdahl’s Law

7/23/2018 18
Amdahl’s Law
• It states that performance improvement gained from
using enhancement is limited by the fraction of the
time the enhancement can be used.
• Speedup:

• Depends on two factors:


1. Fraction enhanced = fraction of execution time taken for enhancement

2. Speedup enhanced = speedup achieved by enhancing Fraction enhanced

7/23/2018 19
Amdahl’s Law
• Execution Time using computer with the enhancement
= Unenhanced portion of computer + time spent using
enhancement

Fraction enhanced
Fraction enhanced / Speedup enhanced

Ex Time old Ex Time new

7/23/2018 20
Problem
• Suppose FP square root (FPSQR) is responsible for 20% of the
execution time for a graphics. One proposal is to enhance the FPSQR
hardware and speed up this operation by a factor of 10. The other
alternative is just to try to make all FP instructions in the graphics
processor run faster by a factor of 1.6; FP instructions are responsible
for half of the execution time for the application. The design team
believes that they can make all FP instructions run 1.6 times faster
with the same effort as required for the fast square root. Compare
these two design alternatives.
Design 1: FPSQR enhancement
Fraction enhanced = 20%
Speedup enhanced = 10
Design 2: FP enhancement
Fraction enhanced = 50%
Speedup enhanced = 1.6

7/23/2018 21
Amdahl’s Law

ExTimeold 1
Speedupoverall  
ExTimenew Fractionenhanced
1  Fractionenhanced  
Speedupenhanced

It expresses the law of diminishing returns: The improvement


gained by enhancement diminishes as additional enhancements are
done.
Best you could ever hope to do:
1
Speedupmaximum 
1 - Fractionenhanced 

7/23/2018 22
Amdahl’s Law example
• Suppose in a web server new CPU 10X faster
• Assume that it’s an I/O bound server, so 60% time
waiting for I/O
1
Speedup overall 
1  Fraction enhanced   Fraction enhanced
Speedup enhanced
1 1
   1.56
1  0.4  0.4 0.64
10
• Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster

7/23/2018 23
Processor performance equation
• Micro-processors are based on a clock running at
a constant rate

CPU time = CPU clock cycles for a program * Clock cycle time  1

CPI = CPU clock cycles for a program 2


Instruction count

CPU time = Instruction count * CPI * Clock cycle time

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle

7/23/2018 26
𝑛

𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠 = 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖


𝑖=1
𝑛

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 = 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒


𝑖=1
𝑛
𝑖=1 𝐼𝐶𝑖 ∗ 𝐶𝑃𝐼𝑖
𝐶𝑃𝐼 =
𝐼𝑛𝑠𝑡𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑛
𝐼𝐶𝑖
= ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1

7/23/2018 27
Problem
• Consider a graphics card, with
– FP operations (excluding FPSQR): frequency 25%,
average CPI 4.0
– FPSQR operations only: frequency 2%, average CPI
20
– all other instructions: average CPI 1.3333333

• Design option 1: decrease CPI of FPSQR to 2


• Design option 2: decrease CPI of all FP operations to
2.5

7/23/2018 28
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 4 + 0.02 ∗ 20 + 0.73 ∗ 1.33
= 2.3709
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼1 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 4 + 0.02 ∗ 2 + 0.73 ∗ 1.33
= 2.0109
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼2 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 2.5 + 0.02 ∗ 20 + 0.73 ∗ 1.33
= 1.9959

7/23/2018 29
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙
𝑆𝑝𝑒𝑒𝑑𝑢𝑝1 =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒1
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 2.3709
= =
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 1 2.0109
= 1.179
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙
𝑆𝑝𝑒𝑒𝑑𝑢𝑝2 =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒2
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 2.3709
= =
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 2 1.9959
= 1.187

7/23/2018 30
Problem
• Suppose a program (or a program task) takes 1 billion
instructions to execute on a processor running at 2 GHz. 50% of
the instructions execute in 3 clock cycles, 30% execute in 4
clock cycles, and 20% execute in 5 clock cycles. What is the
execution time for the program or task? If the processor is
redesigned such that all instructions that initially executed in 5
cycles now execute in 4 cycles. What is the overall percentage
improvement?

7/23/2018 31