ECE468 Computer Organization and Architecture Designing A Pipeline Processor

ECE468 Computer Organization and Architecture Designing a Pipeline Processor
ECE468 pipeline.1
Adapted from VC and UCB
A Single Cycle Processor

Branch Jump Zero Clk Instruction<31:0> <16:20> <21:25> <11:15> Instruction Fetch Unit <0:15> <31:26> op Main Control 3 RegDst ALUSrc
ALUop <5:0> func
Rt RegDst
Rd 1 Mux 0 Rs 5 5
Rd Imm16 Rt
ALU Control
RegWr 5 busW 32 Clk
ALUctr 3 Zero ALU 32 32 WrEn Adr Data In 32 Clk ALUSrc Data Memory MemWr 0 Mux MemtoReg
busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32
Mux
imm16 Instr<15:0>
16
ExtOp
ECE468 pipeline.2
Drawbacks of this Single Cycle Processor
Long cycle time: Cycle time must be long enough for the load instruction: - PCs Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew
Cycle time is much longer than needed for all other instructions. Examples: R-type instructions do not require data memory access Jump does not require ALU operation nor data memory access
ECE468 pipeline.3
Overview of a Multiple Cycle Implementation
The root of the single cycle processors problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle Cycle time: time it takes to execute the longest step Keep all the steps to have similar length
This is the essence of the multiple cycle processor The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete - Load takes five cycles - Jump only takes three cycles Allows a functional unit to be used more than once per instruction
ECE468 pipeline.4 Adapted from VC and UCB
Multiple Cycle Processor

MCP: A functional unit to be used more than once per instruction
PCWr PCWrCond Zero IorD MemWr IRWr
32 32 32 32 0 Rs 32 Rt Rt 0 Rd 1 1 Mux 0 5 5 0
PCSrc RegDst RegWr ALUSelA

1
BrWr Target
Mux
32
PC Mux
1 32 32
0 32 0 1 2 3 32
Zero ALU
Mux
Instruction Reg
RAdr
Ra Rb Rw busW busB 32 busA 32
Ideal Memory
WrAdr 32 Din Dout
Reg File
Mux
32
<< 2
ALU Control
Imm 16
Extend
32
ALUOp ALUSelB
ExtOp
ECE468 pipeline.5
MemtoReg
Outline of Todays Lecture--Pipelining

Introduction to the Concept of Pipelined Processor Pipelined Datapath and Pipelined Control How to Avoid Race Condition in a Pipeline Design? Pipeline Example: Instructions Interaction
ECE468 pipeline.6
Timing Diagram of a Load Instruction

Instruction Fetch
Clk PC Old Value Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Old Value Delay through Extender & Mux Old Value Old Value Data Memory Access Time busW
ECE468 pipeline.7
Instr Decode / Reg. Fetch
Address
Data Memory
Reg Wr
Rs, Rt, Rd, Op, Func ALUctr ExtOp ALUSrc RegWr busA busB Address
Instruction Memory Access Time New Value Delay through Control Logic New Value New Value New Value
Register File Write Time
New Value Register File Access Time New Value New Value ALU Delay New Value New
Old Value
The Five Stages of Load

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Load Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file
ECE468 pipeline.8
Key Ideas Behind Pipelining
Grading the mid term exams: 5 problems, five people grading the exam Each person ONLY grades one problem Pass the exam to the next person as soon as one finishes his part Assume each problem takes 0.5 hour to grade Each individual exam still takes 2.5 hours to grade But with 5 people, all exams can be graded much quicker
The load instruction has 5 stages: Five independent functional units to work on each stage - Each functional unit is used only once The 2nd load can start as soon as the 1st finishes its Ifet stage Each load still takes five cycles to complete The throughput, however, is much higher
ECE468 pipeline.9
Pipelining the Load Instruction

Cycle 1 Cycle 2 Clock 1st lw Ifetch Reg/Dec Exec Reg/Dec Ifetch Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
2nd lw Ifetch 3rd lw
The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register Files Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register Files Write port (bus W) for the Wr stage One instruction enters the pipeline every cycle One instruction comes out of the pipeline (complete) every cycle The Effective Cycles per Instruction (CPI) is 1
The Four Stages of R-type

Cycle 1 Cycle 2 Cycle 3 Cycle 4
R-type Ifetch
Reg/Dec
Exec
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: ALU operates on the two register operands Wr: Write the ALU output back to the register file
ECE468 pipeline.11
Pipelining the R-type and Load Instruction

Cycle 1 Cycle 2 Clock R-type Ifetch R-type Reg/Dec Ifetch Load Exec Reg/Dec Ifetch Wr Exec Reg/Dec Wr Exec Reg/Dec Mem Exec Reg/Dec Wr Wr Exec Wr Ops! We have a problem! Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
R-type Ifetch
R-type Ifetch
We have a problem: Two instructions try to write to the register file at the same time!
ECE468 pipeline.12
Important Observation
Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: Load uses Register Files Write Port during its 5th stage
Load 1 Ifetch 2 Reg/Dec 3 Exec 4 Mem 5 Wr
R-type uses Register Files Write Port during its 4th stage
1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Wr
ECE468 pipeline.13
Solution 1: Insert Bubble into the Pipeline

Cycle 1 Cycle 2 Clock Ifetch Load Reg/Dec Ifetch Exec Reg/Dec Wr Exec Reg/Dec Mem Exec Wr Wr Wr Exec Reg/Dec Wr Exec Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
R-type Ifetch
R-type Ifetch Reg/Dec Pipeline Exec R-type Ifetch Bubble Reg/Dec Ifetch
Insert a bubble into the pipeline to prevent 2 writes at the same cycle The control logic can be complex No instruction is completed during Cycle 5: The Effective CPI for load is 2
ECE468 pipeline.14
Solution 2: Delay R-types Write by One Cycle

Delay R-types register write by one cycle: Now R-type instructions also use Reg Files write port at Stage 5 Mem stage is a NOOP stage: nothing is being done
1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Mem 5 Wr
Cycle 1 Cycle 2 Clock R-type Ifetch R-type Reg/Dec Ifetch Load
Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Exec Reg/Dec Ifetch
Mem Exec Reg/Dec
Wr Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr
R-type Ifetch
R-type Ifetch
ECE468 pipeline.15
The Four Stages of Store

Store
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Write the data into the Data Memory
ECE468 pipeline.16
The Four Stages of Beq

Beq
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: ALU compares the two register operands Adder calculates the branch target address Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC
ECE468 pipeline.17
A Pipelined Datapath
Clk Ifetch Reg/Dec ExtOp Exec ALUOp Mem Branch Wr
RegWr
1 0 PC+4 Imm16 Rs Ra Rt Rt Rd Rb
PC+4 Imm16 busA busB
PC+4
ECE468 pipeline.18
PC
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IF/ID Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
The Instruction Fetch Stage

Location 10: lw $1, 0x100($2)
You are here! Clk Ifetch Reg/Dec RegWr
1 0 PC+4 PC+4 Imm16 busA busB PC+4
$1 <- Mem[($2) + 0x100]
Exec ExtOp ALUOp
Mem Branch
ECE468 pipeline.19
Location 10: lw $1, 0x100($2)
ECE468 pipeline.20
PC = 14
IF/ID: lw $1, 100 ($2)
Imm16 Rs Ra Rt Rt Rd Rb
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
A Detail View of the Instruction Unit
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
You are here! Clk Ifetch

1 0
Reg/Dec
4 Adder PC = 14 IF/ID: lw $1, 100 ($2) 10
Address Instruction Memory Instruction
The Decode / Register Fetch Stage

Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
1 0 PC+4
Exec ExtOp ALUOp
Mem Branch
IF/ID:
PC+4
ID/Ex: Reg. 2 & 0x100
ECE468 pipeline.21
Location 10: lw $1, 0x100($2)
PC Clk
1 0
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
Loads Address Calculation Stage

$1 <- Mem[($2) + 0x100]
You are here!
Ifetch
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg
Reg/Dec RegWr
Exec ALUOp=Add ExtOp=1
Mem Branch
PC+4 Imm16 Rs Ra Rt Rt Rd Rb
IF/ID:
Ex/Mem: Loads Address
PC+4
ECE468 pipeline.22
PC
Mem/Wr Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst=0
ALUSrc=1 MemW r
MemtoReg
A Detail View of the Execution Unit

You are here! Clk Exec Mem
Ex/Mem: Loads Memory Address
<< 2 Adder PC+4 busA 32 32 busB 32 Extender imm16 16 0 Mux
Target 32 Zero ALU ALUout 32 3 ALUctr
ID/Ex Register
ECE468 pipeline.23
32
ALU Control 3
ExtOp=1
ALUSrc=1
ALUOp=Add
Loads Memory Access Stage

1 0 PC+4 Imm16 Rs Ra Rt Rt Rd Rb PC+4 Imm16 busA busB
Exec ExtOp ALUOp
Mem Branch=0
IF/ID:
PC+4
ECE468 pipeline.24
PC
Mem/Wr: Loads Data
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst
ALUSrc
MemWr=0
MemtoReg
Loads Write Back Stage

You are somewhere out there! Clk Ifetch Reg/Dec RegWr=1
1 0 PC+4 Imm16 Rs Ra Rt Rt Rd Rb
Exec ExtOp ALUOp
Mem Branch
Wr
IF/ID:
PC+4
ECE468 pipeline.25
Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) N = Exec, Mem, or Wr Example: Controls Signals at Exec Stage = Func(Loads Exec)
Ifetch Reg/Dec Wr RegWr Exec ALUOp=Add ExtOp=1 Mem Branch
ECE468 pipeline.26
PC
1 0
Mem/Wr Register
Ex/Mem Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
How About Control Signals?
IUnit
I
Mux
0
RegDst
ALUSrc
MemW r
MemtoReg=1
PC+4 Imm16 Rs Ra Rt Rt Rd Rb
IF/ID:
Ex/Mem: Loads Address
PC+4
PC
Mem/Wr Register
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
ID/Ex Register
IUnit
I
Mux
0
RegDst=0
ALUSrc=1 MemW r
MemtoReg
Pipeline Control
The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec ExtOp ALUSrc ALUOp Main Control RegDst MemW r Branch MemtoReg RegWr Exec ExtOp ALUSrc ALUOp RegDst MemW r Branch MemtoReg RegWr Mem Wr
Ex/Mem Register
Mem/Wr Register
ID/Ex Register
ECE468 pipeline.27
RegAdr RegWr RegWrs Clk-to-Q RegAdrs Clk-to-Q RegWr
ECE468 pipeline.28
IF/ID Register Clk Mem/Wr
MemW rBranch MemtoReg RegWr
MemtoReg RegWr
Beginning of the Wrs Stage: A Real World Problem

Clk WrAdr MemWr MemWrs Clk-to-Q WrAdrs Clk-to-Q MemW r WrAdr Data Data Memory Ex/Mem
RegAdr Data
Reg File
At the beginning of the Wr stage, we have a problem if: RegAdrs (Rd or Rt) Clk-to-Q > RegWrs Clk-to-Q Similarly, at the beginning of the Mem stage, we have a problem if: WrAdrs Clk-to-Q > MemWrs Clk-to-Q We have a race condition between Address and Write Enable! from VC and UCB Adapted
The Pipeline Problem

Multiple Cycle design prevents race condition between Addr and WrEn: Make sure Address is stable by the end of Cycle N Asserts WrEn during Cycle N + 1 This approach can NOT be used in the pipeline design because: Must be able to write the register file every cycle Must be able write the data memory every cycle
Clock Store Ifetch Reg/Dec Exec Reg/Dec Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr
Store Ifetch
R-type Ifetch
R-type Ifetch
ECE468 pipeline.29
Synchronize Register File & Synchronize Memory

Solution: And the Write Enable signal with the Clock This is the ONLY place where gating the clock is used MUST consult circuit expert to ensure no timing violation: Example: Clock High Time > Write Access Delay
Synchronize Memory and Register File Address, Data, and WrEn must be stable at least 1 set-up time before the Clk edge Write occurs at the cycle following the clock edge that captures the signals WrEn WrEn I_WrEn Address Data I_Addr I_Data Reg File or Memory C_WrEn Address Data Clk Reg File or Memory
Clk I_Addr I_WrEn C_WrEn
ECE468 pipeline.30
A More Extensive Pipelining Example

Cycle 1 Cycle 2 Clock 0: Load Ifetch Reg/Dec Exec Reg/Dec Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
4: R-type Ifetch
8: Store Ifetch 12: Beq (target is 1000)
End of Cycle 4
End of Cycle 5
End of End of Cycle 6 Cycle 7
End of Cycle 4: Loads Mem, R-types Exec, Stores Reg, Beqs Ifetch End of Cycle 5: Loads Wr, R-types Mem, Stores Exec, Beqs Reg End of Cycle 6: R-types Wr, Stores Mem, Beqs Exec End of Cycle 7: Stores Wr, Beqs Mem
Pipelining Example: End of Cycle 4

0: Loads Mem 4: R-types Exec
8: Stores Reg 12: Beqs Ifet RegWr=0
1 0 PC+4
8: Stores Reg
4: R-types Exec
12: Beqs Ifetch

0: Loads Mem
ALUOp=R-type ExtOp=x
Branch=0
Clk
PC+4 Imm16 Rs Ra Rt Rt Rd Rb PC+4 Imm16 busA busB
Ex/Mem: R-types Result
ID/Ex: Stores busA & B
IF/ID: Beq Instruction
Mem/Wr: Loads Dout
ECE468 pipeline.32
PC = 16
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=1 ALUSrc=0
Clk MemWr=0
MemtoReg=x

0: Lws Wr 4: Rs Mem 8: Stores Exec 12: Beqs Reg 16: Rs Ifetch
12: Beqs Reg 16: Rs Ifet 0: Loads Wr RegWr=1
1 0 PC+4
8: Stores Exec ALUOp=Add ExtOp=1
4: R-types Mem
Branch=0
Clk
PC+4
Mem/Wr: R-types Result
Ex/Mem: Stores Address
IF/ID: Instruction @ 16
ID/Ex: Beqs busA & B
ECE468 pipeline.33
4: Rs Wr 8: Stores Mem 12: Beqs Exec 16: Rs Reg 20: Rs Ifet

16: R-types Reg 20: R-types Ifet 4: R-types Wr RegWr=1 Clk
PC+4
ECE468 pipeline.34
PC = 20
1 0
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I PC+4
Mux
0
RegDst=x ALUSrc=1
Clk MemWr=0
MemtoReg=1
12: Beqs Exec ALUOp=Sub ExtOp=1
8: Stores Mem
Branch=0
ID/Ex:R-types busA & B
Mem/Wr: Nothing for St
Ex/Mem: Beqs Results
PC = 24
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=x ALUSrc=0
Clk MemWr=1
MemtoReg=0

8: Stores Wr 12: Beqs Mem 16: Rs Exec 20: Rs Reg 24: Rs Ifet
20: R-types Reg 24: R-types Ifet 8: Stores Wr RegWr=0 Clk
PC+4
16: R-types Exec ALUOp=R-type ExtOp=x
12: Beqs Mem
Branch=1
1 0
Mem/Wr:Nothing for Beq
ID/Ex:R-types busA & B
Ex/Mem: Rtypes Results
ECE468 pipeline.35
Clk 12: Beq Ifetch Reg/Dec Exec (target is 1000) 16: R-type Ifetch Reg/Dec 20: R-type Ifetch Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr
Although Beq is fetched during Cycle 4: Target address is NOT written into the PC until the end of Cycle 7 Branchs target is NOT fetched until Cycle 8 3-instruction delay before the branch take effect This is referred to as Branch Hazard: Clever design techniques can reduce the delay to ONE instruction
PC = 1000
The Delay Branch Phenomenon

Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11
PC+4
Zero
RFile
Rw Di
Exec Unit
0 1
Data Me m RA Do
WA Di
IUnit
I
Mux
0
RegDst=1 ALUSrc=0
Clk MemWr=0
MemtoReg=x
24: R-type
1000: Target of Br
The Delay Load Phenomenon

Cycle 1 Cycle 2 Clock I0: Load Ifetch Plus 1 Reg/Dec Ifetch Plus 2 Exec Reg/Dec Ifetch Plus 3 Mem Exec Reg/Dec Ifetch Plus 4 Wr Mem Exec Reg/Dec Ifetch Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Although Load is fetched during Cycle 1: The data is NOT written into the Reg File until the end of Cycle 5 We cannot read this value from the Reg File until Cycle 6 3-instruction delay before the load take effect This is referred to as Data Hazard: Clever design techniques can reduce the delay to ONE instruction
Summary
Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Clock Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Pipeline Processor: Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction If a instruction is going to use a functional unit: - it must use it at the same stage as all other instructions Pipeline Control: Each stages control signal depends ONLY on the instruction that is currently in that stage
ECE468 pipeline.38
Single Cycle, Multiple Cycle, vs. Pipeline

Cycle 1 Clk Single Cycle Implementation: Load Store Waste Cycle 2
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg Exec Mem
Wr
Store Ifetch
Reg
Exec
Mem
R-type Ifetch
Pipeline Implementation: Load Ifetch Reg Exec Reg Mem Exec Reg Wr Mem Exec Wr Mem Wr
Store Ifetch
R-type Ifetch
ECE468 pipeline.39

ECE468 Computer Organization and Architecture Designing A Pipeline Processor

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE468 Computer Organization and Architecture Designing A Pipeline Processor

Uploaded by

Copyright:

Available Formats

ECE468 Computer Organization and Architecture Designing a Pipeline Processor

Adapted from VC and UCB

A Single Cycle Processor

ALUop <5:0> func

RegWr 5 busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

Adapted from VC and UCB

Drawbacks of this Single Cycle Processor

Adapted from VC and UCB

Overview of a Multiple Cycle Implementation

Multiple Cycle Processor

PCSrc RegDst RegWr ALUSelA

Ra Rb Rw busW busB 32 busA 32

Outline of Todays Lecture--Pipelining

Adapted from VC and UCB

Timing Diagram of a Load Instruction

Instr Decode / Reg. Fetch

Register File Write Time

Adapted from VC and UCB

The Five Stages of Load

Adapted from VC and UCB

Key Ideas Behind Pipelining

Adapted from VC and UCB

Pipelining the Load Instruction

2nd lw Ifetch 3rd lw

The Four Stages of R-type

Adapted from VC and UCB

Pipelining the R-type and Load Instruction

Adapted from VC and UCB

Adapted from VC and UCB

Solution 1: Insert Bubble into the Pipeline

Adapted from VC and UCB

Solution 2: Delay R-types Write by One Cycle

Cycle 1 Cycle 2 Clock R-type Ifetch R-type Reg/Dec Ifetch Load

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Exec Reg/Dec Ifetch

Mem Exec Reg/Dec

Wr Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr

Adapted from VC and UCB

The Four Stages of Store

Adapted from VC and UCB

The Four Stages of Beq

Adapted from VC and UCB

PC+4 Imm16 busA busB

The Instruction Fetch Stage

$1 <- Mem[($2) + 0x100]

Exec ExtOp ALUOp

Location 10: lw $1, 0x100($2)

IF/ID: lw $1, 100 ($2)

A Detail View of the Instruction Unit

You are here! Clk Ifetch

4 Adder PC = 14 IF/ID: lw $1, 100 ($2) 10

Address Instruction Memory Instruction

Adapted from VC and UCB

The Decode / Register Fetch Stage

Exec ExtOp ALUOp

PC+4 Imm16 busA busB

ID/Ex: Reg. 2 & 0x100

Location 10: lw $1, 0x100($2)

Loads Address Calculation Stage

Exec ALUOp=Add ExtOp=1

Ex/Mem: Loads Address