Professional Documents
Culture Documents
ECE468 pipeline.1
Rt RegDst
Rd 1 Mux 0 Rs 5 5
Rd Imm16 Rt
ALU Control
ALUctr 3 Zero ALU 32 32 WrEn Adr Data In 32 Clk ALUSrc Data Memory MemWr 0 Mux MemtoReg
Mux
imm16 Instr<15:0>
16
ExtOp
ECE468 pipeline.2
Long cycle time: Cycle time must be long enough for the load instruction: - PCs Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew
Cycle time is much longer than needed for all other instructions. Examples: R-type instructions do not require data memory access Jump does not require ALU operation nor data memory access
ECE468 pipeline.3
The root of the single cycle processors problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle Cycle time: time it takes to execute the longest step Keep all the steps to have similar length
This is the essence of the multiple cycle processor The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete - Load takes five cycles - Jump only takes three cycles Allows a functional unit to be used more than once per instruction
ECE468 pipeline.4 Adapted from VC and UCB
BrWr Target
Mux
32
PC Mux
1 32 32
0 32 0 1 2 3 32
Zero ALU
Mux
Instruction Reg
RAdr
Ideal Memory
WrAdr 32 Din Dout
Reg File
Mux
32
<< 2
ALU Control
Imm 16
Extend
32
ALUOp ALUSelB
Adapted from VC and UCB
ExtOp
ECE468 pipeline.5
MemtoReg
ECE468 pipeline.6
Address
Data Memory
Reg Wr
Rs, Rt, Rd, Op, Func ALUctr ExtOp ALUSrc RegWr busA busB Address
Instruction Memory Access Time New Value Delay through Control Logic New Value New Value New Value
New Value Register File Access Time New Value New Value ALU Delay New Value New
Old Value
Load Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file
ECE468 pipeline.8
Grading the mid term exams: 5 problems, five people grading the exam Each person ONLY grades one problem Pass the exam to the next person as soon as one finishes his part Assume each problem takes 0.5 hour to grade Each individual exam still takes 2.5 hours to grade But with 5 people, all exams can be graded much quicker
The load instruction has 5 stages: Five independent functional units to work on each stage - Each functional unit is used only once The 2nd load can start as soon as the 1st finishes its Ifet stage Each load still takes five cycles to complete The throughput, however, is much higher
ECE468 pipeline.9
The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register Files Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register Files Write port (bus W) for the Wr stage One instruction enters the pipeline every cycle One instruction comes out of the pipeline (complete) every cycle The Effective Cycles per Instruction (CPI) is 1
ECE468 pipeline.10 Adapted from VC and UCB
R-type Ifetch
Reg/Dec
Exec
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: ALU operates on the two register operands Wr: Write the ALU output back to the register file
ECE468 pipeline.11
R-type Ifetch
R-type Ifetch
We have a problem: Two instructions try to write to the register file at the same time!
ECE468 pipeline.12
Important Observation
Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: Load uses Register Files Write Port during its 5th stage
Load 1 Ifetch 2 Reg/Dec 3 Exec 4 Mem 5 Wr
R-type uses Register Files Write Port during its 4th stage
1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Wr
ECE468 pipeline.13
R-type Ifetch
R-type Ifetch Reg/Dec Pipeline Exec R-type Ifetch Bubble Reg/Dec Ifetch
Insert a bubble into the pipeline to prevent 2 writes at the same cycle The control logic can be complex No instruction is completed during Cycle 5: The Effective CPI for load is 2
ECE468 pipeline.14
R-type Ifetch
R-type Ifetch
ECE468 pipeline.15
Store
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Write the data into the Data Memory
ECE468 pipeline.16
Beq
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: ALU compares the two register operands Adder calculates the branch target address Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC
ECE468 pipeline.17
A Pipelined Datapath
Clk Ifetch Reg/Dec ExtOp Exec ALUOp Mem Branch Wr
RegWr
1 0 PC+4 Imm16 Rs Ra Rt Rt Rd Rb
PC+4
ECE468 pipeline.18
PC
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IF/ID Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
Adapted from VC and UCB
Mem Branch
ECE468 pipeline.19
ECE468 pipeline.20
PC = 14
Imm16 Rs Ra Rt Rt Rd Rb
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
Adapted from VC and UCB
Reg/Dec
Mem Branch
IF/ID:
PC+4
ECE468 pipeline.21
PC Clk
1 0
Imm16 Rs Ra Rt Rt Rd Rb
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
Ifetch
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
Adapted from VC and UCB
Reg/Dec RegWr
Mem Branch
PC+4 Imm16 Rs Ra Rt Rt Rd Rb
IF/ID:
PC+4
ECE468 pipeline.22
PC
Mem/Wr Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst=0
ALUSrc=1 MemW r
MemtoReg
Adapted from VC and UCB
ID/Ex Register
ECE468 pipeline.23
32
ALU Control 3
ExtOp=1
ALUSrc=1
ALUOp=Add
Adapted from VC and UCB
Mem Branch=0
IF/ID:
PC+4
ECE468 pipeline.24
PC
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemWr=0
MemtoReg
Mem Branch
Wr
IF/ID:
PC+4
ECE468 pipeline.25
Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) N = Exec, Mem, or Wr Example: Controls Signals at Exec Stage = Func(Loads Exec)
Ifetch Reg/Dec Wr RegWr Exec ALUOp=Add ExtOp=1 Mem Branch
ECE468 pipeline.26
PC
1 0
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg=1
Adapted from VC and UCB
PC+4 Imm16 Rs Ra Rt Rt Rd Rb
IF/ID:
PC+4
PC
Mem/Wr Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst=0
ALUSrc=1 MemW r
MemtoReg
Adapted from VC and UCB
Pipeline Control
The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec ExtOp ALUSrc ALUOp Main Control RegDst MemW r Branch MemtoReg RegWr Exec ExtOp ALUSrc ALUOp RegDst MemW r Branch MemtoReg RegWr Mem Wr
Ex/Mem Register
Mem/Wr Register
ID/Ex Register
ECE468 pipeline.27
ECE468 pipeline.28
MemtoReg RegWr
RegAdr Data
Reg File
At the beginning of the Wr stage, we have a problem if: RegAdrs (Rd or Rt) Clk-to-Q > RegWrs Clk-to-Q Similarly, at the beginning of the Mem stage, we have a problem if: WrAdrs Clk-to-Q > MemWrs Clk-to-Q We have a race condition between Address and Write Enable! from VC and UCB Adapted
Clock Store Ifetch Reg/Dec Exec Reg/Dec Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr
Store Ifetch
R-type Ifetch
R-type Ifetch
ECE468 pipeline.29
ECE468 pipeline.30
4: R-type Ifetch
End of Cycle 4
End of Cycle 5
End of Cycle 4: Loads Mem, R-types Exec, Stores Reg, Beqs Ifetch End of Cycle 5: Loads Wr, R-types Mem, Stores Exec, Beqs Reg End of Cycle 6: R-types Wr, Stores Mem, Beqs Exec End of Cycle 7: Stores Wr, Beqs Mem
ECE468 pipeline.31 Adapted from VC and UCB
8: Stores Reg
4: R-types Exec
ALUOp=R-type ExtOp=x
Branch=0
Clk
PC+4 Imm16 Rs Ra Rt Rt Rd Rb PC+4 Imm16 busA busB
ECE468 pipeline.32
PC = 16
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=1 ALUSrc=0
Clk MemWr=0
MemtoReg=x
Adapted from VC and UCB
4: R-types Mem
Branch=0
Clk
PC+4
IF/ID: Instruction @ 16
ECE468 pipeline.33
ECE468 pipeline.34
PC = 20
1 0
Imm16 Rs Ra Rt Rt Rd Rb
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I PC+4
Mux
0
RegDst=x ALUSrc=1
Clk MemWr=0
MemtoReg=1
Adapted from VC and UCB
8: Stores Mem
Branch=0
IF/ID: Instruction @ 20
PC = 24
Imm16 Rs Ra Rt Rt Rd Rb
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=x ALUSrc=0
Clk MemWr=1
MemtoReg=0
Adapted from VC and UCB
Branch=1
1 0
ECE468 pipeline.35
Clk 12: Beq Ifetch Reg/Dec Exec (target is 1000) 16: R-type Ifetch Reg/Dec 20: R-type Ifetch Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr
Although Beq is fetched during Cycle 4: Target address is NOT written into the PC until the end of Cycle 7 Branchs target is NOT fetched until Cycle 8 3-instruction delay before the branch take effect This is referred to as Branch Hazard: Clever design techniques can reduce the delay to ONE instruction
ECE468 pipeline.36 Adapted from VC and UCB
PC = 1000
PC+4
IF/ID: Instruction @ 24
Imm16 Rs Ra Rt Rt Rd Rb
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=1 ALUSrc=0
Clk MemWr=0
MemtoReg=x
Adapted from VC and UCB
24: R-type
1000: Target of Br
Although Load is fetched during Cycle 1: The data is NOT written into the Reg File until the end of Cycle 5 We cannot read this value from the Reg File until Cycle 6 3-instruction delay before the load take effect This is referred to as Data Hazard: Clever design techniques can reduce the delay to ONE instruction
ECE468 pipeline.37 Adapted from VC and UCB
Summary
Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Clock Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Pipeline Processor: Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction If a instruction is going to use a functional unit: - it must use it at the same stage as all other instructions Pipeline Control: Each stages control signal depends ONLY on the instruction that is currently in that stage
ECE468 pipeline.38
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg Exec Mem
Wr
Store Ifetch
Reg
Exec
Mem
R-type Ifetch
Pipeline Implementation: Load Ifetch Reg Exec Reg Mem Exec Reg Wr Mem Exec Wr Mem Wr
Adapted from VC and UCB
Store Ifetch
R-type Ifetch
ECE468 pipeline.39