Professional Documents
Culture Documents
Pipeline
Pipeline
Datapath
IF
instruction fetch
ID
instruction decode/ register fetch
EX
execute/ address calculation
MEM
memory access
WB
write back
MUX
Add
Instruction memory PC
Read address Instruction [31-0]
Instr [20-16]
ALU
zero ALU result
Registers
Instr [15-11]
MUX
Data memory
write addr
MUX
MUX
Instr [15-0]
16
\
Sign extend
Instr [5-0]
32
\
write data
ALU control
CS 740 F97
R-type instructions
IF: Instruction fetch
IR <-- IMemory[PC] PC <-- PC + 4
Ex: Execute
ALUOutput <-- A op B
MEM: Memory
nop
CS 740 F97
Load instruction
IF: Instruction fetch
IR <-- IMemory[PC] PC <-- PC + 4
Ex: Execute
ALUOutput <-- A + SignExtend(IR[15:0])
MEM: Memory
Mem-Data <-- DMemory[ALUOutput]
CS 740 F97
Store instruction
IF: Instruction fetch
IR <-- IMemory[PC] PC <-- PC + 4
Ex: Execute
ALUOutput <-- A + SignExtend(IR[15:0])
MEM: Memory
DMemory[ALUOutput] <-- B
CS 740 F97
Branch on equal
IF: Instruction fetch
IR <-- IMemory[PC] PC <-- PC + 4
Ex: Execute
Target <-- PC + SignExtend(IR[15:0]) << 2 Z <-- (A - B == 0)
MEM: Memory
If (Z) PC <-- target
Pipelining Basics
Unpipelined System
30ns Comb. Logic 3ns R E G Delay = 33ns Throughput = 30MHz
Clock Op1 Time One operation must complete before next can begin Operations spaced 33ns apart
8 CS 740 F97
Op2
Op3
3 Stage Pipelining
10ns Comb. Logic 3ns R E G 10ns Comb. Logic 3ns R E G 10ns Comb. Logic 3ns R E Delay = 39ns G Throughput = 77MHz
10
CS 740 F97
Clock
Diminishing returns as add more pipeline stages Register delays become limiting factor Increased latency Small througput gains
11
CS 740 F97
Comb. Logic
R E G
Comb. Logic
R E G
Comb. Logic
R E G
Pipelined datapath
MUX
IF/ID
Add
ID/EX
EX/MEM
MEM/WB
shift left 2
Add
PC
Instr [25-21] Instruction mem Read address Instruction [31-0] Instr [20-16] read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data 16 \ ALU zero ALU result read addr read data Data mem write addr MUX Instr [15-0] Sign ext Instr [5-0] Instr [20-16] Instr [15-11] MUX 32 \ ALU ctl write data
MUX
13
CS 740 F97
Pipeline Structure
Branch Flag & Target PC IF/ID ID/EX EX/MEM MEM/WB
IF
Instr. Mem.
ID
Reg. File
EX
MEM
Data Mem.
Next PC
Notes
Each stage consists of operate logic connecting pipe registers WB logic merged into ID Additional paths required for forwarding
14
CS 740 F97
Pipe Register
Operation
Current State stays constant while Next State being updated Update involves transferring Next State to Current
15
CS 740 F97
Pipeline Stage
ID
Reg. File Current State Next State
Operation
Computes next state based on current From/to one or more pipe registers May have embedded memory elements Low level timing signals control their operation during clock cycle Writes based on current pipe register state Reads supply values for Next state
16 CS 740 F97
MIPS Simulator
Features
Based on MIPS subset Code generated by dis Hexadecimal instruction code Executable available HOME/public/sim/solve_tk
Run Controls Speed Control Mode Selection Current State Pipe Register Next State
Demo Programs
HOME/public/sim/solve_tk/demos
Register Values
17
CS 740 F97
Reg-Reg Instructions
R-type instructions (add, sub, and, or, slt): rd <-- rs funct rt 0 rs rt rd shamt funct
31-26 25-21 20-16 15-11 10-6 5-0
+4
Add
IF/ID
ALU Op rs instr A
ID/EX
A
EX/MEM
MEM/WB
Instr. Mem. PC
rd
rt
Reg.
Wdata Waddr
data B B dest
data dest
data dest
ALU
18
CS 740 F97
r2,3 r3,4
IF
Fetch instruction
demo1.O
ID
Fetch operands
r4,r2,r3
# --> 7
EX
Compute ALU result
MEM
Nothing
demo1.s
WB
Store result in destination reg.
19
CS 740 F97
ID
Get addr reg Store: Get data
0x8: 00000000 0xc: 00000000 0x10:ac430005 0x14:00000000 0x18:00000000 0x1c:8c640004 0x20:00000000 0x24:0000000d 0x28:00000000 0x2c:00000000
EX
Compute EA
MEM
Load: Read Store: Write
WB
Load: Update reg.
20
CS 740 F97
ID
Fetch operands
0xc: 00000000 0x10:10430008 0x14:00000000 0x18:00000000 0x1c:00000000 0x20:14430004 0x24:00000000 0x28:00000000 0x2c:00000000 0x30:00432021 0x34:00632021 ...
EX
Subtract operands test if 0 Compute target
MEM
Taken: Update PC to target
WB
Nothing
21
CS 740 F97
addiu $2, $0, 63 addiu $3, $2, 0 addiu $4, $2, 0 addiu $5, $2, 0 addiu $6, $2, 0
$2 $3 $4 $5 $6
ID EX IF
ID EX IF
ID EX IF
ID EX
22
CS 740 F97
23
CS 740 F97
Bubble Stall
24
CS 740 F97
Stall Control
Stall Control
IF
Instr. Mem.
ID
Reg. File
EX
MEM
Data Mem.
Stall Logic
Determines which stages to stall or bubble on next update
25
CS 740 F97
Observations on Stalling
Good
Relatively simple hardware Only penalizes performance when hazard exists
Bad
As if placed NOPs in code Except that does not waste instruction memory
Reality
Some problems can only be dealt with by stalling E.g., instruction cache miss Stall PC, bubble IF/ID Data cache miss Stall PC, IF/ID, ED/EX, EX/MEM, bubble MEM/WB
26
CS 740 F97
Forwarding (Bypassing)
Observation
ALU data generated at end of EX Steps through pipe until WB ALU data consumed at beginning of EX
Idea
Expedite passing of previous instruction result to ALU By adding extra data pathways and control
27
CS 740 F97
IF/ID
ALU Op
rs instr
ID/EX + $1 $2 A 5
B
Bypass Control
EX/MEM MEM/WB
data data data
Instr. Mem. PC
rd
rt
$1 5 $2
12
7 $1 ALU
dest
$2 dest
dest
Program:
0x18: 0x1c: add $2, $1, $2 add $1, $1, $2
28
CS 740 F97
Data Destinations
ALU A input Need in EX Reg-Reg Operand Load/Store Base ALU B input Need in EX Reg-Reg Operand Write Data Need in MEM Store Data
30 CS 740 F97
Complete Bypassing
MUX
IF/ID
Add
ID/EX
EX/MEM
MEM/WB
shift left 2 Instr [25-21] Instruction mem Read address Instruction [31-0] Instr [20-16]
Add
read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data 16 \
MUX
read addr read data Data mem write addr MUX write data
PC
Instr [15-0]
32 \ ALU ctl
EX/EX MEM/EX
MEM/MEM
31
CS 740 F97
demo5.O
0x10: 2402000f li 0x14: 00000000 nop 0x18: 00401821 move 0x1c: 00000000 nop 0x20: 2402000c li 0x24: 24020010 li 0x28: ac430000 sw 0x2c: 00000000 nop 0x30: 8c440000 lw 0x34: 00822821 addu 0x38: 00000000 nop 0x3c: 0000000d break 0x40: 00000000 nop 0x44: 00000000 nop 32
CS 740 F97
Impact of Forwarding
Single Remaining Hazard Class
Load followed by ALU operation Including address calculation Load-ALU lw $1, 8($2) add $2, $1, $2
MIPS I Architecture
Programmer may not put instruction after load that uses loaded register Not even allowed to assume it will be old value Load delay slot Can force with .set noreorder
MIPS II Architecture
No restriction on programs Stall following instruction one cycle Then pick up with MEM/EX bypass
33
34
CS 740 F97
distance = 1
LW.rs SW.rs SW.rt Bx.rs Bx.rt JR.rs JALR.rs
reads writes
R.rd RI.rt LW.rt JAL.$31 JALR.rd 35 R.rs R.rt RI.rs
distance = 2
LW.rs SW.rs SW.rt Bx.rs Bx.rt JR.rs JALR.rs
CS 740 F97
demo7.O
0x48: 00000000 0x4c: 00000000 0x50: 00000000 0x54: 00000000 0x58: 0001000d 0x5c: 00000000 0x60: 00000000 0x64: 00000000 0x68: 00000000 0x6c: 00000000
nop nop nop nop break nop nop nop nop nop 1
0x10: 00a00821 0x14: 00000000 0x18: 00000000 0x1c: 24040010 0x20: 00000000 0x24: 00000000 0x28: 1024000b 0x2c: 00000000 0x30: 00000000 0x34: 00000000 0x38: 00000000 0x3c: 00000000 0x40: 0000000d 0x44: 00000000
Tests for single failure mode ALU Rd --> ALUI Rs, dist 1 Hits break 0 when error Jumps to break 1 when OK Grep for ERROR or break 0
CS 740 F97
36
Pipelined datapath
IF
instruction fetch
ID
instruction decode/ register fetch
EX
execute/ address calc
MEM
memory access
WB
write back
MUX
Add shift left 2 Instr [25-21] Instruction mem Read address Instruction [31-0] Instr [20-16] Add
read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data 16 \
MUX
read addr read data Data mem write addr MUX write data
PC
Instr [15-0]
32 \ ALU ctl
37
CS 740 F97
addiu $2, $0, 63 addiu $3, $0, 63 addiu $4, $0, 63 addiu $5, $0, 63
ID EX IF
ID EX IF
$2 $3 $4 $5 $6
ID EX
38
CS 740 F97
Branch Example
Assume
All registers initially 0 Branch Code (demo8.O) 0x00: BEQ $0, $0, 5 # Slot # Xtra1 # Xtra2 # Xtra3
Desired Behavior
Take branch at 0x00 Execute delay slot 0x04 Execute target 0x18 PC + 4 + I16<<2 PC = 0x00 I16 = 5
39
CS 740 F97
IF/ID PC
instr
ID/EX
EX/MEM
MEM/WB
Instr. Mem.
nxt cur nxt cur
Reg.
Data Mem.
nxt cur
nxt cur
nxt cur
40
CS 740 F97
Desired Behavior
Execute entire sequence
Effect of Stalling
Wastes 2 cycles waiting for branch decision
41
CS 740 F97
Strategy
Continue fetching under assumption that branch not taken If decide to take branch, cancel undesired ones
MUX Add +4 Branch? Bubble Bubble Target
IF/ID PC
X T t g r t a 2
nxt cur nxt cur instr
ID/EX
X t r a 1
EX/MEM
MEM/WB
Instr. Mem.
Reg.
s l o t
nxt cur
b e q
nxt cur
Data Mem.
nxt cur
42
CS 740 F97
Alternative Schemes
Predict taken Would be hard to squeeze into our pipeline Cant compute target until ID In principle, would give CPI 1.11 Backwards taken, forwards not taken Predict based on sign of I16 Exploits fact that loops usually closed with backward branches
43
CS 740 F97
+4
instr M U X
rs rt
Reg.
B
=
ID/EX
branch?
44
CS 740 F97
Exceptions
An exception is a transfer of control to the OS in response to some event (i.e. change in processor state)
Operating System
45
CS 740 F97
46
CS 740 F97
47
CS 740 F97
A C
48
CS 740 F97
Disable interrupts and switch to kernel mode Jump to common exception handler location
49
CS 740 F97
50
CS 740 F97
user code sub $11, $2, $4 and $12, $2, $5 or $13, $2, $6 overflow add $1, $2, $1 slt $15, $6, $7 lw $16, 50($7) handler code sw $25, 1000($0) ...
IF
ID IF
EX ID IF
MEM EX ID IF
51
CS 740 F97
IF
Instr. Mem.
ID
Reg. File
EX
Data Mem.
Next PC
52
CS 740 F97
Illegal Instruction followed by store (exc2.O) 0x0: li r2,15 # --> 0xf # Unimplemented
53
CS 740 F97
Good instruction in delay slot (exc3.O) 0x0: li 0x4: jr 0x8: li r3,3 r3 r3,5 # Bad address # Should execute Which is excepting instruction?
Bad instruction in delay slot (exc4.O) 0x0: li 0x4: jr 0x8: sw r3,3 r3 # Bad address
54
CS 740 F97
0x14: break
55
CS 740 F97
Implementation Features
Correct
Detects excepting instruction (exc4.O) Furthest one down pipeline = Earliest one in program order Completes all preceding instructions (exc3.O) Usually aborts excepting instruction & beyond (exc1.O) Prioritizes exception conditions Earliest stage where instruction ran into problems Avoids false alarms (exc5.O) Problematic instructions that get canceled anyhow
Shortcomings
Store following excepting instruction (exc2.O) EPC SPC for delay slot instruction
56
CS 740 F97