Professional Documents
Culture Documents
Chap4 Compressed
Chap4 Compressed
The Processor
§4.1 Introduction
Introduction
◼ CPU performance factors
◼ Instruction count
◼ Determined by ISA and compiler
◼ CPI and Cycle time
◼ Determined by CPU hardware
◼ We will examine two MIPS implementations
◼ A simplified version
◼ A more realistic pipelined version
◼ Simple subset, shows most aspects
◼ Memory reference: lw, sw
◼ Arithmetic/logical: add, sub, and, or, slt
◼ Control transfer: beq, j
A
Y
B
◼ Arithmetic/Logic Unit
◼ Multiplexer ◼ Y = F(A, B)
◼ Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Clk
D Q
D
Clk
Q
Clk
D Q Write
Write D
Clk
Q
Increment by
4 for next
32-bit instruction
register
Sign-bit wire
replicated
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
Prediction
correct
Prediction
incorrect
MEM
Right-to-left WB
flow leads to
hazards
Wrong
register
number
Need to stall
for one cycle
Stall inserted
here
Or, more
accurately…
Chapter 4 — The Processor — 80
Datapath with Hazard Detection
Flush these
instructions
(Set control
values to 0)
PC
… IF ID EX MEM WB
beq stalled IF ID
beq stalled IF ID
beq stalled ID
Hold pending
operands
72 physical
registers
◼ FP is 5 stages longer
◼ Up to 106 RISC-ops in progress
◼ Bottlenecks
◼ Complex instructions with long dependencies
◼ Branch mispredictions
◼ Memory access delays