Professional Documents
Culture Documents
Performance Evaluation
1
Basic concepts and performance
metrics
Performance
• Purchasing perspective
• given a collection of machines, which has the
• best performance?
• least cost?
• best performance / cost?
• Design perspective
• faced with design options, which has the
• best performance improvement?
• least cost?
• best performance / cost?
• Both require
• basis for comparison
• metrics for evaluation
• Our goal is to understand cost & performance implications of architectural
choices
BAD/Sud
3 hours 1350 mph 132 178,200
Concorde
MIPS = fCLK
CPI x 106
ExTimeold 1
Speedupoverall = -------------------- = ----------------------------------------
ExTimenew (1 – FractionE) + FractionE
SpeedupE
1 1
Speedupoverall = ---------------------------------------- = ---------------- = 1.56
(1 – FractionE) + FractionE 0.6 + 0.04
SpeedupE
Datapath
Control Megabytes per second
Function Units
TransistorsWires Pins Cycles per second (clock rate)
Each metric has a place and a purpose, and each can be misused
Compiler X X
Instr. Set X X X
Organization X X
Technology X
- 23 -
Cristina Silvano, 21/03/2021
Performance Metrics
IC = Instruction Count
- 24 -
Cristina Silvano, 21/03/2021
Example
IC = Instruction Count = 5
# Clock Cycles = IC + # Stall Cycles + 4 = 5 + 3 + 4 = 12
CPI = Clock Per Instruction = # Clock Cycles / IC = 12 / 5 = 2.4
MIPS = fclock / (CPI * 10 6) = 500 MHz / 2.4 * 10 6 = 208.3
- 25 -
Cristina Silvano, 21/03/2021
Performance Metrics (2)
• Let us consider n iterations of a loop composed of m instructions per
iteration requiring k stalls per iteration
IC per_iter = m
# Clock Cycles per iter = IC per_iter + # Stall Cycles per_iter +4
CPI per_iter = (IC per iter + # Stall Cycles per_iter +4) /IC per_iter
= (m + k + 4) / m
MIPS per_iter = fclock / (CPI per_iter * 10 6)
- 26 -
Cristina Silvano, 21/03/2021
Asymptotic Performance Metrics
• Let us consider n iterations of a loop composed of m instructions
per iteration requiring k stalls per iteration
= lim n -> ( m *n + k * n + 4 ) / m * n
= (m + k) / m
MIPS AS = fclock / (CPIAS* 10 6)
- 27 -
Cristina Silvano, 21/03/2021
Performance Issues in Pipelining
• The ideal CPI on a pipelined processor would be 1, but stalls
cause the pipeline performance to degrade form the ideal
performance, so we have:
Ave. CPI Pipe = Ideal CPI + Pipe Stall Cycles per Instruction
= 1 + Pipe Stall Cycles per Instruction
- 28 -
Cristina Silvano, 21/03/2021
Performance Issues in Pipelining
Pipeline Speedup = Ave. Exec. Time Unpipelined =
Ave. Exec. Time Pipelined
= Ave. CPI Unp. x Clock Cycle Unp. =
Ave. CPI Pipe Clock Cycle Pipe
- 29 -
Cristina Silvano, 21/03/2021
Performance Issues in Pipelining
• If we ignore the cycle time overhead of pipelining and we assume the
stages are perfectly balanced, the clock cycle time of two processors
can be equal, so:
Pipeline Speedup = __________Ave. CPI Unp._________
1 + Pipe Stall Cycles per Instruction
• Simple case: All instructions take the same number of cycles, which
must also equal to the number of pipeline stages (called pipeline
depth):
Pipeline Speedup = __________Pipeline Depth________
1 + Pipe Stall Cycles per Instruction
• If there are no pipeline stalls (ideal case), this leads to the intuitive
result that pipelining can improve performance by the depth of the
pipeline.
- 30 -
Cristina Silvano, 21/03/2021
Performance of Branch Schemes
• What is the performance impact of conditional branches?
= _______________Pipeline Depth_________________
1 + Branch Frequency x Branch Penalty
- 31 -
Cristina Silvano, 21/03/2021
Performance evaluation of the
memory hierarchy
Memory Hierarchy: Definitions
• Hit: data found in a block of the upper level
• Hit Rate: Number of memory accesses that find the data in the
upper level with respect to the total number of memory accesses
Hit Rate = # hits ______
# memory accesses
• Hit Time: time to access the data in the upper level of the
hierarchy, including the time needed to decide if the attempt of
access will result in a hit or miss
=> AMAT = Hit Rate * Hit Time + Miss Rate * (Hit Time + Miss Penalty)
AMAT = (Hit Rate + Miss Rate) * Hit Time + Miss Rate * Miss Penalty
Putting all together: Let us also consider the stalls due to pipeline hazards:
AMATHarvard=75%x(1+0.64%x50)+25%x(1+6.47%x50) = 2.05
AMATUnified=75%x(1+1.99%x50)+25%x(1+1+1.99%x50)= 2.24
Processor
L1 cache
L2 cache
Main Memory
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 x Miss PenaltyL2)
AMAT = Hit TimeL1 + Miss RateL1 x Hit TimeL2 + Miss RateL1 L2 x Miss PenaltyL2
where:
Memory stall cycles per instr =
MissesL1 per instr X Hit TimeL2 + MissesL2 per instr X Miss PenaltyL2
MissesL1 per instr = Memory Accesses Per Instr x Miss RateL1
MissesL2 per instr = Memory Accesses Per Instr x Miss RateL1 L2
- 48 -