You are on page 1of 69

COE 485 Sem 1, 2024

Computer Architecture & Organization

Single-Cycle Datapath
Review
• Construction of the Datapath
– Instruction-specific building blocks (R, I, J formats)
– Modular design
• ALU, Register File, Data Memory
• ALU or adder for computing branch target address (BTA)
– Instruction-specific connection of datapath components
• Instruction Formats and the Datapath
– R: ALU operation,
– I: Load/store - Data I/O from register file/memory
– I: Conditional branch – Eval. condition, Compute BTA
– J: Jump (unconditional branch) – Compute JTA
Overview of Today’s Lecture
• Can we make a datapath operate in one cycle?
– All instructions executed in CPI = 1
– Increases efficiency of software
• Composition of simple datapath components
• Build up the datapath iteratively
– R-format instruction
– I-format
– J-format
• Problems with the single-cycle assumption
Processor Performance
CPU time = IC * CPI * Cycle time
Program

Compiler

ISA

Microarchitecture

Hardware
Implementation Review

Instruction rd Data
memory
rs Address
PC

Address Registers ALU


rt Data
Instruction memory

+4 Data
imm
Opcode,
funct
Controller

° Datapath is based on register transfers required to execute instructions


° Control causes the right transfers to happen at the right time
Component: R-format Datapath
• Format: opcode r1, r2, r3
ALU op
Register File 3

Read Reg 1 Read


Instruction Zero
Read Reg 2 Data 1

Write ALU
Register Read
Write Data Data 2
Result
Register
Write
Component: Load/Store Datapath

Fetch Decode Execute


Component: Branch Datapath

Fetch Decode Execute


R-format Datapath Actions
Instruction: add $t0, $t1, $t2
1. Fetch instruction and increment PC
2. Input $t0 and $t1 from Register File
3. ALU operates on $t0 and $t1, per the funct
field of the MIPS instruction (Bits 5-0)
4. Result from ALU written to Register File
using bits 15-11 of instruction to select
destination register (e.g., $t0).
Load/Store Datapath Actions
Instruction: lw $t1, offset($t2)
1. Fetch instruction and increment PC
2. Read register value (e.g., base address in $t2)
from Register File
3. ALU adds value from $t2 to sign-extended
lower 16 bits of the instruction (i.e., offset)
4. Result from ALU = address to Data Memory
5. Retrieve data from memory, write to Register
File, per register number in $t1 (Bits 20-16)
R-format + Load/Store Datapath

Fetch Decode Execute


Animating the Datapath:
R-type Instruction
Instruction add rd,rs,rt
32 16 5 5 5 Operation
3
RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
Animating the Datapath:
Load Instruction
Instruction lw rt,offset(rs)
32 16 5 5 5 Operation
3
RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
Animating the Datapath:
Store Instruction
Instruction sw rt,offset(rs)
32 16 5 5 5 Operation
3
RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
MIPS Datapath II: Single-Cycle
Separate adder as ALU operations and PC
increment occur in the same clock cycle
Add

Read Registers
ALU operation
register 1 3 MemWrite
PC Read
Read Read MemtoReg
address
register 2 data 1 ALUSrc Zero
Instruction ALU ALU
Write Read Address Read
register data 2 M result data
u M
Instruction Write x u
memory Data x
data memory
Write
RegWrite data
16 Sign 32 MemRead
extend
Separate instruction memory
as instruction and data read
occur in the same clock cycle
Adding instruction fetch
Branch Datapath Actions
Instruction: beq $t1, $t2, offset
1. Fetch instruction and increment PC
2. Read registers (e.g., $t1 and $t2) from the
register file from Register File
3. ALU subtracts $t1 - $t2. Adder sums PC + 4
plus sign-extended lower 16 bits of offset
shifted left two bits => branch target address
4. ALU’s Zero output directs PC+4 or BTA to be
written as new PC
R-format + Load/Store + Branch DP

Fetch Decode Execute


MIPS Datapath III: Single-Cycle
PCSrc New multiplexor

M
Add u
x
4 Add ALU
result
Shift
left 2 Extra adder needed as both
adders operate in each cycle
Registers
Read 3 ALU operation
MemWrite
Read register 1 ALUSrc
PC Read
address Read data 1 MemtoReg
register 2 Zero
Instruction ALU ALU
Write Read Address Read
register M result data
data 2 u M
Instruction u
memory Write x Data x
data memory
Write
RegWrite data
16 32
Sign
extend MemRead
Instruction address is either
PC+4 or branch target address

Adding branch capability and another multiplexor


Important note: in a single-cycle implementation data cannot be stored
during an instruction – it only moves through combinational logic
Question: is the MemRead signal really needed?! Think of RegWrite…!
Datapath Executing add
ADD
M
ADD
ADD U
4 X

PC <<2 PCSrc
Instruction
ADDR RD
32 16 5 5 5 Operation
Instruction 3
Memory RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD

add rd, rs, rt


N MemRead
D
Datapath Executing lw
ADD
M
ADD
ADD U
4 X

PC <<2 PCSrc
Instruction
ADDR RD
32 16 5 5 5 Operation
Instruction 3
Memory RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD

lw rt,offset(rs)
N MemRead
D
Datapath Executing sw
ADD
M
ADD
ADD U
4 X

PC <<2 PCSrc
Instruction
ADDR RD
32 16 5 5 5 Operation
Instruction 3
Memory RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD

sw rt,offset(rs)
N MemRead
D
Datapath Executing beq
ADD
M
ADD
ADD U
4 X

PC <<2 PCSrc
Instruction
ADDR RD
32 16 5 5 5 Operation
Instruction 3
Memory RN1 RN2 WN
RD1
Register File ALU Zero
WD

M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD

beq r1,r2,offset
N MemRead
D
Control Overview
• Single-cycle implementation
– Datapath: combinational logic, I-mem, regs, D-mem, PC
• Last three written at end of cycle
– Need control – just combinational logic!
– Inputs:
• Instruction (I-mem out)
• Zero (for beq)
– Outputs:
• Control lines for muxes
• ALUop
• Write-enables
Control Overview
• Fast control
– Divide up work on “need to know” basis
– Logic with fewer inputs is faster
• E.g.
– Global control need not know which ALUop
ALU Control
• Assume the control line values in table
ALU Control
• Plan to control ALU: main control sends a 2-bit ALUOp control field to the ALU control. Based on
ALUOp and funct field of instruction the ALU control generates the 3-bit ALU control field

– ALU control Func-


field tion
Recall from Ch. 4
000 and
001 or 2
010 add
3
110 sub ALUOp To
111 slt Main ALU
Control ALU ALU
Control
control
• ALU must perform input
– add for load/stores (ALUOp 00)
– sub for branches (ALUOp 01) 6
– one of and, or, add, sub, slt for R-type instructions, depending on the instruction’s 6-bit funct field
(ALUOp 10) Instruction ALUOp generation
funct field by main control
ALU Control
Instruction Operation Opcode Function
add add 000000 100000
sub sub 000000 100010
and and 000000 100100
or or 000000 100101
slt slt 000000 101010

• ALU-ctrl = f(opcode,function)
But…don’t forget
Instruction Operation Opcode function
lw add 100011 xxxxxx
sw add 101011 xxxxxx
beq sub 000100 100010

• To simplify ALU-ctrl
– ALUop = f(opcode)
2 bits 6 bits
ALU Control

• ALU-ctrl = f(ALUop, function)


4 bits 2 bits 6 bits
ALU Control
10 add, sub, and, …
00 lw, sw
01 beq

• ALU-ctrl = f(ALUop, function)


4 bits 2 bits 6 bits
• Requires only five gates plus inverters
ALU Control
Setting ALU Control Bits
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
Branch eq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 set on less 101010 set on less 111

ALUOp Funct field Operation


ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
*Typo in text 0* 1 X X X X X X 110
Fig. 5.15: if it is X 1 X X X 0 0 0 0 010
then there is potential 1 X X X 0 0 1 0 110
conflict between 1 X X X 0 1 0 0 000
line 2 and lines 3-7!
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
Truth table for ALU control bits
Designing the Main Control
R-type opcode rs rt rd shamt funct
31-26 25-21 20-16 15- 10-6 5-0
11

Load/store
opcode rs rt address
or branch
31-26 25-21 20-16 15-0

• Observations about MIPS instruction format


– opcode is always in bits 31-26
– two registers to be read are always rs (bits 25-21) and rt (bits 20-16)
– base register for load/stores is always rs (bits 25-21)
– 16-bit offset for branch equal and load/store is always bits 15-0
– destination register for loads is in bits 20-16 (rt) while for R-type
instructions it is in bits 15-11 (rd) (will require multiplexor to select)
Datapath with WriteReg Control
Datapath with Control I
PCSrc

1
Add M
u
x
4 ALU 0
Add result
New multiplexor RegWrite Shift
left 2

Instruction [25– 21] Read


Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp

Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?
Control Signals
Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data inpu
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory

Effects of the seven control signals


Recall: MIPS Instr. Format
MIPS Instruction Bits - Rules
Global Control
• Global control outputs
– ALU-ctrl - see above
– ALU src - R-format, beq vs. ld/st
– MemRead - lw
– MemWrite - sw
– MemtoReg - lw
– RegDst - lw dst in bits 20:16, not 15:11
– RegWrite - all but beq and sw
– PCSrc - beq taken
Global Control
• Global control outputs
– Replace PCsrc with
• Branch beq
• PCSrc = Branch * Zero
• What are the inputs needed to determine
above global control signals?
– Just Op[5:0]
Control Signals Needed
Datapath with Control II
0
M
u
x
ALU
Add result 1
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21] Read


PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15 0] Sign
extend ALU
control

Instruction [5 0]

MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
Global Control
Instruction Opcode RegDst ALUSrc
rrr 000000 1 0
lw 100011 0 1
sw 101011 x 1
beq 000100 x 0
??? others x x
• RegDst = ~Op[0]
• ALUSrc = Op[0]
• RegWrite = ~Op[3] * ~Op[2]
Datapath Control (Finalized)
PCSrc cannot be
0
M
set directly from the
u
x
opcode: zero test
ALU
Add result 1
outcome is required
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21] Read


PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data

Datapath with Instruction [15 0]


16
Sign
extend
32

ALU
control

Control II (cont.) Instruction [5 0]

Determining control signals for the MIPS datapath based on instruction opcode
Memto- Reg Mem Mem
Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Control Signals:
R-Type Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X

I[25:21] I[20:16] I[15:11] 1


PC <<2 PCSrc
Instruction
ADDR RD I
32 5 0
Instruction 0 1 Value depends
Memory
MUX
RegDst ??? on funct
16 Operation
5 5 5
1 3
RN1 RN2 WN
RD1
Register File ALU Zero
0
WD 0
immediate/
offset M MemWrite 0
MemtoReg
RD2 U ADDR
I[15:0] RegWrite X
1
1
Data
E Memory RD M

1
U
16 X 32 ALUSrc X
T WD

Control signals
N
D
0 MemRead 0

shown in blue 0
Control Signals:
lw Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X

I[25:21] I[20:16] I[15:11] 1


PC <<2 PCSrc
Instruction
ADDR RD I
32 5 01 0
0 1
Instruction
Memory
MUX
RegDst 0
Operation
0
16 5 5 5
3
RN1 RN2 WN

Register File
RD1
ALU Zero 0
WD 0
immediate/
offset M MemWrite 1
MemtoReg
RD2 U ADDR
I[15:0] RegWrite X
1
1
Data
E Memory RD M

1
U
16 X 32 ALUSrc X
T WD

Control signals
N
D
1 MemRead 0

shown in blue 1
Control Signals:
sw Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X

I[25:21] I[20:16] I[15:11] 1


PC <<2 PCSrc
Instruction
ADDR RD I 0
Instruction
32
0
5

MUX
1 01
RegDst
Memory 16 5 5 5
X 03
Operation

RN1 RN2 WN

Register File
RD1
ALU Zero 1
WD 0
immediate/
offset M MemWrite X
MemtoReg
RD2 U ADDR
I[15:0] RegWrite X 1
1
Data
E Memory RD M

0
U
16 X 32 ALUSrc X
T WD

Control signals
N
D 1 MemRead 0

shown in blue 0
Control Signals:
beq Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X

I[25:21] I[20:16] I[15:11] 1


PC <<2 PCSrc
Instruction
I 1 if Zero=1
ADDR
Instruction
RD
32
0
5
1 11
Memory 16
MUX
RegDst
0
Operation
5 5 5
X 3
RN1 RN2 WN

Register File
RD1
ALU Zero 0
WD 0
immediate/
offset M MemWrite X
MemtoReg
RD2 U ADDR
I[15:0] RegWrite X
1
1
Data
E Memory RD M

0
U
16 X 32 ALUSrc X
T WD

Control signals
N
D
0 MemRead 0

shown in blue 0
Global Control
• More complex with entire MIPS ISA
– Need more systematic structure
– Want to share gates between control signals
• Common solution: PLA
– MIPS opcode space designed to minimize PLA
inputs, minterms, and outputs
• Refer to MIPS Opcode map
PLA
• In AND-plane, &
selected inputs to get
minterms
• In OR-plane, | selected
minterms to get outputs
• E.g.
Datapath Extension: Jump Instr.
Instruction: j address
1. Fetch instruction and increment PC
2. Read address from immediate field of instr.
3. Jump target address (JTA) has these bits:
• Bits 31-28: Upper four bits of PC+4
• Bits 27-02: Immediate field of Jump instr.
• Bits 01-00: Zero (002)
4. Mux controlled by Jump Control Bit selects
JTA or branch target address as new PC
Control Signals; Add Jumps
Datapath Extension: Jump Instr.

• Bits 31-28: Upper four bits of (PC + 4)


• Bits 27-02: Immediate field of jump instruction
• Bits 01-00: Zero (002) - Word alignment
Control Signals w/Jumps
Datapath with Control III
Jump opcode address
31-26 Composing jump 25-0 New multiplexor with additional
target address control bit Jump

Instruction [25– 0] Shift Jump address [31– 0]


left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add Shift
RegDst
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU
control

Instruction [5– 0]

MIPS datapath extended to jumps: control unit generates new Jump control bit
Datapath Extension: Jump Instr.
Datapath Executing j
28 32
jmpaddr I[25:0]
<<2 CONCAT
1
ADD 26
PC+4[31-28] 0 M
U
M
ADD X
ALUOp
ADD U
4 Control ALU X 0
Unit 2 Control 1
PC <<2
funct
PCSrc Jump
op 6 I[31:26] 6 I[5:0]
ADDR RD 5
32 Instruction I 0 1
Instruction MUX
Memory RegDst Operation
16 5 5 5 Branch
3
RN1 RN2 WN
op I[31: RD1 Zero
Register File ALU
WD 0
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X 1
1
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD
N MemRead 0
D
Single-cycle Implementation
Notes
• The steps are not really distinct as each instruction completes in
exactly one clock cycle – they simply indicate the sequence of
data flowing through the datapath
• The operation of the datapath during a cycle is purely
combinational – nothing is stored during a clock cycle
• Therefore, the machine is stable in a particular state at the start of
a cycle and reaches a new stable state only at the end of the cycle
• Very important for understanding single-cycle computing:
Load Instruction Steps
lw $t1, offset($t2)
1. Fetch instruction and increment PC
2. Read base register from the register file: the base register ($t2)
is given by bits 25-21 of the instruction
3. ALU computes sum of value read from the register file and
the sign-extended lower 16 bits (offset) of the instruction
4. The sum from the ALU is used as the address for the data
memory
5. The data from the memory unit is written into the register file:
the destination register ($t1) is given by bits 20-16 of the
instruction
Branch Instruction Steps
beq $t1, $t2, offset
1. Fetch instruction and increment PC
2. Read two register ($t1 and $t2) from the register file
3. ALU performs a subtract on the data values from the
register file; the value of PC+4 is added to the sign-
extended lower 16 bits (offset) of the instruction shifted left
by two to give the branch target address
4. The Zero result from the ALU is used to decide which
adder result (from step 1 or 3) to store in the PC
Implementation: ALU Control Block
ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
0* 1 X X X X X X 110 *Typo in text
1 X X X 0 0 0 0 010 Fig. 5.15: if it is X
1 X X X 0 0 1 0 110 then there is potential
conflict between
1 X X X 0 1 0 0 000 line 2 and lines 3-7!
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
Truth table for ALU control bits

ALUOp
ALU control block
ALUOp0
ALUOp1

Operation2
F3
Operation
F2 Operation1
F (5– 0)
F1
Operation0
F0

ALU control logic


Implementation: Main Control
Block Inputs
Op5
Op4
Op3
Signal R- lw sw beq Op2
name format Op1
Op0
Op5 0 1 1 0
Op4 0 0 0 0
Inputs

Op3 0 0 1 0 Outputs
Op2 0 0 0 1 R-format Iw sw beq
RegDst
Op1 0 1 1 0 ALUSrc
Op0 0 1 1 0 MemtoReg
RegDst 1 0 x x RegWrite
ALUSrc 0 1 1 0 MemRead
MemtoReg 0 1 x x MemWrite
Outputs

RegWrite 1 1 0 0 Branch

MemRead 0 1 0 0 ALUOp1

MemWrite 0 0 1 0 ALUOpO

Branch 0 0 0 1 Main control PLA (programmable


ALUOp1 1 0 0 0 logic array): principle underlying
ALUOP2 0 0 0 1 PLAs is that any logical expression
Truth table for main control signals can be written as a sum-of-products
Single-Cycle Design Problems
• Assuming fixed-period clock every instruction datapath uses one clock
cycle implies:
– CPI = 1
– cycle time determined by length of the longest instruction path (load)
• but several instructions could run in a shorter clock cycle: waste of time
• consider if we have more complicated instructions like floating point!
– resources used more than once in the same cycle need to be duplicated
• waste of hardware and chip area
Example: Fixed-period clock vs.
variable-period clock in a

single-cycle implementation
Consider a machine with an additional floating point unit. Assume functional unit delays as
follows
– memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns., register file access
(read or write): 1 ns.
– multiplexors, control unit, PC accesses, sign extension, wires: no delay
• Assume instruction mix as follows
– all loads take same time and comprise 31%
– all stores take same time and comprise 21%
– R-format instructions comprise 27%
– branches comprise 5%
– jumps comprise 2%
– FP adds and subtracts take the same time and totally comprise 7%
– FP multiplys and divides take the same time and totally comprise 7%
• Compare the performance of (a) a single-cycle implementation using a fixed-period clock
with (b) one using a variable-period clock where each instruction executes in one clock cycle
that is only as long as it needs to be (not really practical but pretend it’s possible!)
Solution
Instruction Instr. Register ALU Data Register FPU FPU Total
class mem. read oper. mem. write add/ mul/ time
sub div ns.
Load word 2 1 2 2 1 8
Store word 2 1 2 2 7
R-format 2 1 2 0 1 6
Branch 2 1 2 5
Jump 2 2
FP mul/div 2 1 1 16 20
FP add/sub 2 1 1 8 12

• Clock period for fixed-period clock = longest instruction time = 20 ns.


• Average clock period for variable-period clock = 8  31% +
7  21% + 6  27% + 5  5% + 2  2% + 20  7% + 12  7%
= 7.0 ns.
• Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9
Fixing the problem with single-
cycle designs
• One solution: a variable-period clock with different cycle times for
each instruction class
– unfeasible, as implementing a variable-speed clock is technically
difficult
• Another solution:
– use a smaller cycle time…
– …have different instructions take different numbers of cycles
by breaking instructions into steps and fitting each step into one cycle
– feasible: multicyle approach!
Problems
• Can we make a datapath operate in one cycle?
– All instructions executed in CPI = 1
– Increases efficiency of software in MIPS
Problems with single-cycle datapath
– Propagation delay for 1-5 components
– No phased execution: Must settle in 1 clock cycle
– Maximum delay = Load instruction (5 components)
– Increases clock cycle time
– Decreased Performance tcpu = IC * CPI * tcyc
Conclusions
• Can we make a datapath operate in one cycle?
– Yes – “Some design required”
• Do we want a single-cycle datapath?
- No! Increases cycle time ( tcyc => tcpu)
• Build up the datapath w/ different instructions
– ALU operations, Load, Store, Branch
– Can add new instructions (Jump)
– New instructions = Read further to satisfy your
curiosity

You might also like