Professional Documents
Culture Documents
Computer Architecture ELE 475 / COS 475 Slide Deck 2: Microcode and Pipelining Review
Computer Architecture ELE 475 / COS 475 Slide Deck 2: Microcode and Pipelining Review
Computer Architecture ELE 475 / COS 475 Slide Deck 2: Microcode and Pipelining Review
1
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
2
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
3
What Happens When the Processor is
Too Large?
Time Multiplex Resources!
4
What Happens When the Processor is
Too Large?
Time Multiplex Resources!
5
Microcontrol Unit Maurice Wilkes, 1954
op conditional First used in EDSAC-2,
code flip-flop completed 1958
Next state
address
Matrix A Matrix B
Datapath
Data Addr
RegSel MA
rd 3
IR rs2 A B addr addr
rs1
32 GPRs
ImmSel + PC ... Memory MemWrt
Imm ALU RegWrt
2 Ext control ALU
32-bit Reg enReg
enImm enALU data data enMem
Bus 32
Microinstruction: register to register transfer (17 control signals)
8
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
9
An Ideal Pipeline
stage stage stage stage
1 2 3 4
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
13
Pipelined Datapath
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
14
Pipelined Control
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined 15
Pipelined Control
Hardwired
0x4 Controller
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined 16
Pipelined Control
Hardwired
0x4 Controller
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined 17
Pipelined Control
Hardwired
0x4 Controller
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined 18
Iron Law of Processor Performance
Time = Instructions Cycles Time
Program Program * Instruction * Cycle
Instructions per program depends on source code,
compiler technology, and ISA
Cycles per instructions (CPI) depends upon the ISA
and the microarchitecture
Time per cycle depends upon the microarchitecture
and the base technology
Microarchitecture CPI cycle time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 short
19
Iron Law of Processor Performance
Time = Instructions Cycles Time
Program Program * Instruction * Cycle
Instructions per program depends on source code,
compiler technology, and ISA
Cycles per instructions (CPI) depends upon the ISA
and the microarchitecture
Time per cycle depends upon the microarchitecture
and the base technology
Microarchitecture CPI cycle time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 short
Multi-cycle, unpipelined control 20
Iron Law of Processor Performance
Time = Instructions Cycles Time
Program Program * Instruction * Cycle
Instructions per program depends on source code,
compiler technology, and ISA
Cycles per instructions (CPI) depends upon the ISA
and the microarchitecture
Time per cycle depends upon the microarchitecture
and the base technology
Microarchitecture CPI cycle time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 short
Multi-cycle, unpipelined control >1 short 21
CPI Examples
Microcoded machine Time
7 cycles 5 cycles 10 cycles
Inst 1 Inst 2 Inst 3
write
fetch decode & register- execute memory -back
phase fetch phase phase phase phase
write
fetch decode & register- execute memory -back
phase fetch phase phase phase phase
time t0 t1 t2 t3 t4 t5 t6 t7 ....
instruction1 IF1 ID1 EX1 MA1 WB1
instruction2 IF2 ID2 EX2 MA2 WB2
instruction3 IF3 ID3 EX3 MA3 WB3
instruction4 IF4 ID4 EX4 MA4 WB4
instruction5 IF5 ID5 EX5 MA5 WB5 25
Pipeline Diagrams: Space vs. Time
Hardwired
0x4 Controller
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & register- execute memory -back
phase fetch phase phase phase phase
time t0 t1 t2 t3 t4 t5 t6 t7 ....
Resources
IF I1 I2 I3 I4 I5
ID I1 I2 I3 I4 I5
EX I1 I2 I3 I4 I5
MA I1 I2 I3 I4 I5
WB I1 I2 I3 I4 I5 26
Instructions Interact With Each Other
in Pipeline
Structural Hazard: An instruction in the
pipeline needs a resource being used by
another instruction in the pipeline
Data Hazard: An instruction depends on a
data value produced by an earlier instruction
Control Hazard: Whether or not an instruction
should be executed depends on a control
decision made by an earlier instruction
27
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
28
Overview of Structural Hazards
Structural hazards occur when two instructions need
the same hardware resource at the same time
Approaches to resolving structural hazards
Schedule: Programmer explicitly avoids scheduling
instructions that would create structural hazards
Stall: Hardware includes control logic that stalls until
earlier instruction is no longer using contended resource
Duplicate: Add more hardware to design so that each
instruction can access independent resources at the same
time
Simple 5-stage MIPS pipeline has no structural hazards
specically because ISA was designed that way
29
Example Structural Hazard:
0x4
Unied Memory
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
IF ID EX MEM WB
30
Example Structural Hazard:
0x4
Unied Memory
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
IF ID EX MEM WB
addr
rdata
wdata
Unified Memory 31
Example Structural Hazard:
0x4
2-Cycle Memory
Add M0 M1
we Stage Stage
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
IF ID EX MEM WB
32
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
33
Overview of Data Hazards
Data hazards occur when one instruction depends on a
data value produced by a preceding instruction still in
the pipeline
Approaches to resolving data hazards
Schedule: Programmer explicitly avoids scheduling
instructions that would create data hazards
Stall: Hardware includes control logic that freezes earlier
stages until preceding instruction has nished producing
data value
Bypass: Hardware datapath allows values to be sent to an
earlier stage before preceding instruction has left the
pipeline
Speculate: Guess that there is not a problem, if incorrect
kill speculative instruction and restart
34
Example Data Hazard
r4 r1 r1
0x4
Add IR IR IR
31
we
rs1
rs2
addr rd1 A
PC we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
MD1 MD2
...
r1 r0 + 10 (ADDI R1, R0, #10)
r4 r1 + 17 (ADDI R4, R1, #17) r1 is stale. Oops!
...
35
Feedback to Resolve Hazards
0x4 nop IR IR IR
Add
31
we
rs1
rs2
addr rd1 A
PC we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
...
MD1 MD2
r1 r0 + 10
r4 r1 + 17
...
37
Stalled Stages and Pipeline Bubbles
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) r4 (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2
(I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 WB3
(I4) stalled stages IF4 ID4 EX4 MA4 WB4
(I5) IF5 ID5 EX5 MA5 WB5
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I3 I3 I3 I4 I5
ID I1 I2 I2 I2 I2 I3 I4 I5
Resource
EX I1 nop nop nop I2 I3 I4 I5
Usage
MA I1 nop nop nop I2 I3 I4 I5
WB I1 nop nop nop I2 I3 I4 I5
0x4 nop
Add IR IR IR
31
we
rs1
rs2
addr rd1 A
PC we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
MD1 MD2
Cdest
we
rs1
rs2
addr rd1 A
PC we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
MD1 MD2
I-type: op rs rt immediate16
J-type: op immediate26
source(s) destination
ALU rd (rs) func (rt) rs, rt rd
ALUI rt (rs) op immediate rs rt
LW rt M [(rs) + immediate] rs rt
SW M [(rs) + immediate] (rt) rs, rt
BZ cond (rs)
true: PC (PC) + immediate rs
false: PC (PC) + 4 rs
J PC (PC) + immediate
JAL r31 (PC), PC (PC) + immediate 31
JR PC (rs) rs
JALR r31 (PC), PC (rs) rs 31
41
Deriving the Stall Signal
Cdest Cre
ws = Case opcode re1 = Case opcode
ALU rd ALU, ALUi,
ALUi, LW rt LW, SW, BZ,
JAL, JALR R31 JR, JALR on
J, JAL off
we = Case opcode
ALU, ALUi, LW (ws 0) re2 = Case opcode
JAL, JALR on ALU, SW on
... off ... off
Cstall
stall = ((rsD =wsE).weE +
(rsD =wsM).weM +
(rsD =wsW).weW) . re1D +
((rtD =wsE).weE +
(rtD =wsM).weM +
(rtD =wsW).weW) . re2D
42
Hazards due to Loads & Stores
Stall Condition
What if
(r1)+7 = (r3)+5 ?
0x4 nop IR IR IR
Add
31
we
rs1
rs2
addr rd1 A
PC we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
MD1 MD2
...
M[(r1)+7] (r2) Is there any possible data hazard
r4 M[(r3)+5] in this instruction sequence?
... 43
Data Hazards Due to Loads and Store
Example instruction sequence
Mem[ Regs[r1] + 7 ] <- Regs[r2]
Regs[r4] <- Mem[ Regs[r3] + 5 ]
44
Overview of Data Hazards
Data hazards occur when one instruction depends on a
data value produced by a preceding instruction still in
the pipeline
Approaches to resolving data hazards
Schedule: Programmer explicitly avoids scheduling
instructions that would create data hazards
Stall: Hardware includes control logic that freezes earlier
stages until preceding instruction has nished producing
data value
Bypass: Hardware datapath allows values to be sent to an
earlier stage before preceding instruction has left the
pipeline
Speculate: Guess that there is not a problem, if incorrect
kill speculative instruction and restart
45
Adding Bypassing to the Datapath
stall
r4 r1... r1 ...
0x4 nop
E M W
IR IR IR
Add
31
ASrc
we
rs1
rs2
PC addr D rd1 A
we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
MD1 MD2
48
Bypass and Stall Signals
Split weE into two components: we-bypass, we-stall
we-bypassE = Case opcodeE we-stallE = Case opcodeE
ALU, ALUi (ws 0) LW (ws 0)
... off JAL, JALR on
... off
49
Fully Bypassed Datapath
stall PC for JAL, ...
0x4 nop
E M W
IR IR IR
Add
ASrc 31
we
rs1
rs2
A
PC addr D rd1 we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
BSrc
MD1 MD2
Is there still
a need for the
stall signal ? stall = (rsD=wsE). (opcodeE=LWE).(wsE0 ).re1D
+ (rtD=wsE). (opcodeE=LWE).(wsE0 ).re2D
50
Overview of Data Hazards
Data hazards occur when one instruction depends on a
data value produced by a preceding instruction still in
the pipeline
Approaches to resolving data hazards
Schedule: Programmer explicitly avoids scheduling
instructions that would create data hazards
Stall: Hardware includes control logic that freezes earlier
stages until preceding instruction has nished producing
data value
Bypass: Hardware datapath allows values to be sent to an
earlier stage before preceding instruction has left the
pipeline
Speculate: Guess that there is not a problem, if incorrect
kill speculative instruction and restart
51
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
52
Control Hazards
What do we need to calculate next PC?
For Jumps
Opcode, offset and PC
For Jump Register
Opcode and Register value
For Conditional Branches
Opcode, PC, Register (for condition), and offset
For all other instructions
Opcode and PC
have to know its not one of above!
53
Opcode Decoding Bubble
(assuming no branch delay slots for now)
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) r3 (r2) + 17 IF2 IF2 ID2 EX2 MA2 WB2
(I3) IF3 IF3 ID3 EX3 MA3 WB3
(I4) IF4 IF4 ID4 EX4 MA4 WB4
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 nop I2 nop I3 nop I4
ID I1 nop I2 nop I3 nop I4
Resource
Usage EX I1 nop I2 nop I3 nop I4
MA I1 nop I2 nop I3 nop I4
WB I1 nop I2 nop I3 nop I4
Add
E M
0x4 nop
Add IR IR
Jump? I1
PC addr
inst IR
104 Inst
Memory I2
Jump? II21 I1
IRSrcD
Any
PC addr nop interaction
inst IR
304
104 Inst between
Memory nop
I2
stall and
IRSrcD = Case opcodeD jump?
I1 096 ADD
J, JAL nop
I2 100 J 304
... IM
I3 104 ADD kill
I4 304 ADD
56
Jump Pipeline Diagrams
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1
(I2) 100: J 304 IF2 ID2 EX2 MA2 WB2
(I3) 104: ADD IF3 nop nop nop nop
(I4) 304: ADD IF4 ID4 EX4 MA4 WB4
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4 I5
ID I1 I2 nop I4 I5
Resource
EX I1 I2 nop I4 I5
Usage
MA I1 I2 nop I4 I5
WB I1 I2 nop I4 I5
Add
E M
0x4 nop
IR IR
Add
BEQZ? I1
zero?
IRSrcD
PC addr nop A
inst IR
ALU Y
104 Inst
Memory I2
Add
E BEQZ? M
0x4 nop
IR IR
Add
I2 I1
zero?
IRSrcD
PC addr nop A
inst IR
ALU Y
108 Inst
Memory I3
Add
E BEQZ? M
IRSrcE
0x4 nop
IR IR
Add
Jump? I2 I1
zero?
PC
IRSrcD
PC addr nop A
inst IR
ALU Y
108 Inst
Memory I3
61
Control Equations for PC and IR Muxes
PCSrc = Case opcodeE
BEQZ.z, BNEZ.!z br Give priority
...
Case opcodeD
to the older
J, JAL jabs instruction,
JR, JALR rind
... pc+4 i.e., execute-stage
IRSrcD = Case opcodeE instruction
BEQZ.z, BNEZ.!z nop
... over decode-stage
Case opcodeD instruction
J, JAL, JR, JALR nop
... IM
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4 I5
ID I1 I2 I3 nop I5
Resource
EX I1 I2 nop nop I5
Usage
MA I1 I2 nop nop I5
WB I1 I2 nop nop I5
Add E
nop
0x4 IR
Add
Zero detect on
register file output
we
rs1
rs2
addr nop rd1
PC ws
inst IR
wd rd2
Inst GPRs
D
Memory
Pipeline diagram now same as for jumps64
Branch Delay Slots
(expose control hazard to software)
I1 096 ADD
I2 100 BEQZ r1 +200
Delay slot instruction executed
I3 104 ADD
I4 304 ADD
regardless of branch outcome
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4
ID I1 I2 I3 I4
Resource
EX I1 I2 I3 I4
Usage
MA I1 I2 I3 I4
WB I1 I2 I3 I4
66
Why an Instruction may not be
dispatched every cycle (CPI>1)
Full bypassing may be too expensive to implement
typically all frequently used paths are provided
some infrequently used bypass paths may increase cycle time and
counteract the benefit of reducing CPI
Loads have two-cycle latency
Instruction after load cannot use load result
MIPS-I ISA defined load delay slots, a software-visible pipeline hazard
(compiler schedules independent instruction or inserts NOP to avoid
hazard). Removed in MIPS-II (pipeline interlocks added in hardware)
MIPS:Microprocessor without Interlocked Pipeline Stages
Conditional branches may cause bubbles
kill following instruction(s) if no delay slots
68
Agenda
Microcoded Microarchitectures
Pipeline Review
Pipelining Basics
Structural Hazards
Data Hazards
Control Hazards
69
Acknowledgements
These slides contain material developed and copyright by:
Arvind (MIT)
Krste Asanovic (MIT/UCB)
Joel Emer (Intel/MIT)
James Hoe (CMU)
John Kubiatowicz (UCB)
David Patterson (UCB)
Christopher Batten (Cornell)
70
Copyright 2013 David Wentzlaff
71