Professional Documents
Culture Documents
Lecture 18: 1
Announcements
• No instructor office hour today
– Rescheduled to Monday April 16, 4:00-5:30pm
Lecture 18: 2
Pipelined Microprocessor
Control
CU Signals
VCZN
Adder
+2 Fm … F0 Data
RF RAM
M LD
P
Decoder
U Inst SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in
SE
Lecture 18: 3
Data Hazard Problem
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
Lecture 18: 5
Review: Compiler Inserts NOPs (Solution 1)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
Lecture 18: 6
Review: HW Stalls the Pipeline (Solution 2)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM bubble bubble bubble Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L
U
Lecture 18: 7
Solution 3: HW Forwarding (Bypassing)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
Lecture 18: 8
Pipeline Modifications for Forwarding?
IM Reg DM
A
L
U
Lecture 18: 9
Pipelined Microprocessor w/o Forwarding
Control
CU Signals
VCZN
Adder
+2 Fm … F0 Data
RF RAM
M LD
P
Decoder
U Inst SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in
SE
Lecture 18: 10
Pipelined Processor with Forwarding
Control
CU Signals
VCZN
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
U
P Inst
Decoder
SA
U
X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB V C Z N
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 11
Forwarding in Action
Control
CU Signals
VCZN
Adder
Fm … F0 Data
M
RF U M RAM
LD X U
Decoder
SA X M
M ALU
SB
M
U U
DR X D_IN
U
X
M
U
X
X MB V C Z N
D_in MW MD
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 12
Forwarding in Action
Control
CU Signals
VCZN
Adder
Fm … F0 Data
M
RF U M RAM
LD X U
Decoder
SA X M
M ALU
SB
M
U U
DR X D_IN
U
X
M
U
X
X MB V C Z N
D_in MW MD
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 13
Forwarding in Action
Control
CU Signals
VCZN
Adder
Fm … F0 Data
M
RF U M RAM
LD X U
Decoder
SA X M
M ALU
SB
M
U U
DR X D_IN
U
X
M
U
X
X MB V C Z N
D_in MW MD
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 14
HW Forwarding
Control
CU Signals
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
U
P Inst
Decoder
SA
U
X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB V C Z N
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 15
Example: Data Hazards w/o Forwarding
• Identify all data hazards in the following instruction
sequences by circling each source register that is read
before the updated value is written back
ADD R1, R2, R3
OR R4, R1, R3
SUB R5, R2, R1
SUB R6, R1, R2
Lecture 18: 16
Example: Data Hazards w/ Forwarding
• Identify all data hazards in the following instruction
sequences by circling each source register that is read
before the updated value is written back
ADD R1, R2, R3
OR R4, R1, R3
SUB R5, R2, R1
SUB R6, R1, R2
Lecture 18: 17
Another Example: Data Hazards w/ Forwarding
• Identify all data hazards in the following instruction
sequences by circling each source register that is read
before the updated value is written back
LW R1, 0(R2)
OR R4, R1, R3
SUB R5, R2, R1
Lecture 18: 18
Data Hazards Caused by Load
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
Lecture 18: 19
Load Instructions and Forwarding
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
Lecture 18: 20
Solution 1: Compiler Inserts NOP Instruction
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
Lecture 18: 21
Solution 2: HW Stalls the Pipeline
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM bubble Reg L DM Reg
U
Lecture 18: 22
Solution 3: Delay Slots
• A delay slot is a location in the program where
the compiler is required to insert an instruction
between dependent instructions
Lecture 18: 23
Filling the Load Delay Slot
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
ADDI
NOP R7,R7,3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R7,3
Lecture 18: 24
Filling the Load Delay Slot ?
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
LW R1,0(R2) IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R6,3
Lecture 18: 25
The Problem with Branches
Control
CU Signals
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
U
P Inst
Decoder
SA
U
X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB V C Z N
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 18: 26
The Problem with Branches
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
BEQ R2,R3,X IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
X: AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
...
Lecture 18: 28
Branch Delay Slot
• If the ISA defines a branch delay slot, the
instruction immediately following a branch is
always executed after the branch
Lecture 18: 29
Reducing the Branch Delay
• We already calculate the branch target in ID
Lecture 18: 30
Evaluating Branch Condition in ID
Control
EvaluatingCU
Branch Condition in ID Signals
Control
=
CUsign bit Signals
=?
Adder
sign bit
+2 Fm … F0 Data
M +2 !Adder
LD
RF
M
U
X
M Fm … F0 Data
RAM
P Inst U
Decoder
M
U M SARF X RAM M
U M
ALU
X UC P RAM
LD M
SB X U
U
Decoder
U
Inst SA
DR M X M
M X ALU D_IN X
X C RAM !SB U M
MX U U U
PCJ !DR X
MB D_IN
U MX X
PCJ D_in X U MW MD
PCL X MB
D_in MW MD
PCL
SE
SE
IF/ID ID/EX EX/MEM MEM/WB
IF/ID ID/EX EX/MEM MEM/WB
Lecture
Lecture 18: 31 19: !2
Filling the Branch Delay Slot with a NOP
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
BEQ R2,R3,X IM Reg L DM Reg
U
A
NOP IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
X: AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R7,3
...
Lecture 18: 32
Filling the Branch Delay Slot
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
BEQ R2,R3,X IM Reg L DM Reg
U
A
ADDI
NOP R7,R7,3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
X: AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R7,3
The ADDI is always executed after the BEQ.
...
If the BEQ is taken, executing the ADDI
must not cause incorrect behavior
Lecture 18: 33
Branch Target Address (PC+2+OFF)
Control
CU Signals
VCZN
Adder
+2 Fm … F0 Data
RF RAM
M LD
P
Decoder
U Inst SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in
SE
Lecture 18: 34
Before Next Class
Next Time
Lecture 18: 35