You are on page 1of 60

Designing a Single Cycle

Datapath
The Big Picture: The Performance Perspective
CPI
° Performance of a machine is determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction Inst. Count Cycle Time

° Processor design (datapath and control) will determine:


• Clock cycle time
• Clock cycles per instruction
° Single cycle processor - one clock cycle per instruction
• Advantages: Simple design, low CPI
• Disadvantages: Long cycle time, which is limited by
the slowest instruction.
How to Design a Processor: step-by-
step
1. Analyze instruction set => datapath requirements
• the meaning of each instruction is given by register
transfers
R[rd] <– R[rs] + R[rt];
• datapath must include storage element for ISA
registers
• datapath must support each register transfer
2. Select set of datapath components and establish
clocking methodology
3. Design datapath to meet the requirements
4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
5. Design the control logic
MIPS Instruction set
MIPS operands
Name Example Comments
$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero alw ays equals 0. Register $at is
$fp, $sp, $ra, $at reserved for the assembler to handle large constants.
Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so
232 memory Memory[4], ..., sequential w ords differ by 4. Memory holds data structures, such as arrays,
words Memory[4294967292] and spilled registers, such as those saved on procedure calls.

MIPS assembly language


Category Instruction Example Meaning Comments
add add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers

Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers

add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants
load word lw $s1, 100($s2) $s1 = Memory[$s2 + 100] Word from memory to register
store word sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory
Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100] Byte from memory to register
store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memory
load upper immediate lui $s1, 100 $s1 = 100 * 2
16 Loads constant in upper 16 bits

branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to Equal test; PC-relative branch
PC + 4 + 100

branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to Not equal test; PC-relative
PC + 4 + 100
Conditional
branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; Compare less than; for beq, bne
else $s1 = 0

set less than slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; Compare less than constant
immediate else $s1 = 0

jump j 2500 go to 10000 Jump to target address


Uncondi- jump register jr $ra go to $ra For switch, procedure return
tional jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call
Review: The MIPS Instruction
Formats
° All MIPS instructions are 32 bits long. The three
instruction formats are:
31 26 21 16 11 6 0
• R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
• I-type 31 26 21 16 0
op rs rt immediate
• J-type 6 bits 5 bits 5 bits 16 bits
31 26 0
op target address
° The different fields
6 bits are: 26 bits
• op: operation of the instruction
• rs, rt, rd: the source and destination register specifiers
• shamt: shift amount
• funct: selects the variant of the operation in the “op” field
• address / immediate: address offset or immediate value
• target address: target address of the jump instruction
CP
U
° We will build a CPU to implementOrg our subset of the MIPS ISA
aniz
• Memory Reference Instructions:
-
-
Load Word (LW)
Store Word (SW)
atio
n
• Arithmetic and Logic Instructions:
- ADD, SUB, AND, OR, SLT
• Branch and Jump Instructions:
- Branch if equal (BEQ)
- Jump unconditional (J)
° These basic instructions exercise a majority of the necessary datapath and control
logic for a more complete implementation

6
Register Transfer Logic (RTL)

° RTL gives the meaning of the instructions


° All instructions start by fetching the instruction
op | rs | rt | rd | shamt | funct = MEM[ PC ]
op | rs | rt | Imm16 = MEM[ PC ]

inst Register Transfers


addu R[rd] <– R[rs] + R[rt]; PC <– PC + 4
subu R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ori R[rt] <– R[rs] + zero_ext(imm16); PC <–
PC + 4
load R[rt] <– MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4
store MEM[ R[rs] + sign_ext(imm16) ] <– R[rt]; PC <– PC + 4
beq if ( R[rs] == R[rt] ) then PC <– PC + 4 + sign_ext(imm16)] << 00
else PC <– PC + 4
Step 1: Requirements of the Instruction
Set
° Memory
• instruction & data
° Registers (32 x 32)
• read rs
• read rt
• write rt or rd
° PC
° Extender (sign extend or zero extend)
° Add and sub register or extended immediate
° Add 4 or shifted extended immediate to PC
Gen
eric
° Use program counterImp(PC) to supply instruction
address lem
ent
° Get instruction from
atio
memory
n
° Read registers
° Use instruction to decide exactly what to do

9
CP
U
° Single-cycle CPU (CPI Imp= 1)
lem
• All instructions execute in a single, long clock cycle
ent
atio
° Multi-cycle CPU (CPIns = n)
• Instructions can take a different number of short clock
cycles to execute

10
Abs
trac
t/
Sim
plifi
ed
Data
Vie
Register w
#
PC Address Instruction Registers ALU Address
Instruction Register #
memory Data
Register # memory

Data

° Two types of functional units:


• Elements that operate on data values (combinational)
• Elements that contain state (sequential)

11
Stat
e
Ele
° Unclocked vs clocked
me
° Clocks
nts used in synchronous logic

falling edge

cycle time
rising edge

12
Unc
lock
° Set-reset latched
Stat
• Output depends on present inputs and also on past
inputs
e
Ele
me
nt

13
Lat
che
° Output is equal to the sstored value inside the
element and
(don't need to ask forFlip
permission to look at the
-
value)
flop
° Change of state (value) s is based on the clock

° Latches: whenever the inputs change, and the


clock is asserted

° Flip-flop: state changes only on a clock edge


(edge-triggered methodology)

14
D-
latc
° Two inputs: h
• the data value to be stored (D)
• the clock signal (C) indicating when to read & store D

° Two outputs:
• the value of the internal state (Q) and it's complement

C
D
Q

C
_
Q
D Q

15
D
flip-
° Output changes only on the
flopclock edge
(Re
gist
er)
D D Q D Q Q
D D
latch latch _ _
C C Q Q

16
Our
Imp
lem
° An edge triggered methodology
ent
° Typical execution: atio
• Read contents of somenstate elements,
• Send values through some combinational logic
• Write results to one or more state elements

State State
element Combinational logic element
1 2

Clock cycle

17
Reg
iste
r
° Built using D flip-flips
File

Read register
number 1
Register 0
Register 1 M
u Read data 1 Read register
x number 1 Read
Register n – 1 data 1
Register n Read register
number 2
Read register Register file
Write
number 2 register
Read
Write data 2
M data Write
u Read data 2
x

18
Reg
iste
° Note: We still use the rreal clock to determine when
to write File

Write

C
0
Register 0 Read register
1 D number 1 Read
data 1
n-to-1 C Read register
Register number number 2
decoder Register 1
D Register file
n– 1 Write
register
n Read
Write data 2
data Write
C
Register n – 1
D
C
Register n
Register data D

19
Sim
ple
Imp
° Include the functional units we need for each
instruction lem
ent
° Think of this as a puzzle!
atio 
n
Instruction
address

PC
Instruction Add Sum
MemWrite
Instruction
memory

Address Read
data 16 32
Sign
a. Instruction memory b. Program counter c. Adder extend
Write Data
data memory

ALU control MemRead


5 Read 3
register 1
Read
data 1 a. Data memory unit b. Sign-extension unit
Register 5 Read
numbers register 2 Zero
Registers Data ALU ALU
5 Write result
register
Read
Write data 2
Data data

RegWrite

a. Registers b. ALU

20
Inst
ruct
ion
° Identify which components each instruction type
Ord
would use and in what order: ALU-Type, LW, SW,
BEQ erin
g
ALU control MemWrite
5 Read 3 ALU control
register 1 5 Read 3
Instruction Instruction Read register 1
data 1 Read
address address Register 5 Read data 1
Register 5 Read Address Read
numbers register 2 Zero numbers register 2 Zero data 16 32
PC PC Registers Data ALU ALU Registers Data ALU ALU Sign
Instruction Add Sum 5 Write 5 Write extend
Instruction Add Sum result register
result
Data
register Read Write
Instruction Instruction Read data 2 data memory
Write
memory memory Write data 2 Data data
Data data RegWrite
MemRead
RegWrite
a. Registers b. ALU
a. Instruction memory b. Program counter c. Adder a. Instruction memory b. Program counter c. Adder
a. Data memory unit b. Sign-extension unit
a. Registers b. ALU

ALU-Type LW SW BEQ
(ADD R3, R2, R1) (LW R2, 4(R1)) (SW R2, 8(R1)) (BEQ R3, R1, displace)
1. PC 1. PC 1. PC 1. PC
2. I-Mem 2. I-Mem 2. I-Mem 2. I-Mem
3. Registers 3. Base. Reg. 3. Base. Reg 3. Register Access
4. ALU 4. ALU 4. ALU 4. Compare
5. WB to Reg. 5. Read Mem 5. Write Mem 5. If Zero,
6. WB to Reg. Update PC = PC+disp

21
Fet
ch
° Required operations Co
mp PC and reading instruction from memory
• Taking address from
• Incrementing PCone
to point at next instruction
nts
° Components
• PC register
• Instruction Memory / Cache
• Adder to increment PC value

22
Fet
ch
° PC value serves as address
Dat to instruction memory
apa by 4 using the adder
while also being incremented
th
° Instruction word is returned by memory after some
delay
° New PC value is clocked into PC register at end of
clock cycle

23
Fet
ch
° The PC and adder operationDat is shown
apa
• The PC doesn’t update until the end of the current cycle
th
° The instruction being readExa out from the instruction
memory mpl

e
We have shown “assembly” syntax and the field by field
machine code breakdown

24
Mo
difi
ed
° Support for branch instruction added
Fet
ch
Dat
apa
th

25
Mo
difi
° Mux provides a path ed
for branch target address
Fet
ch
Exa
mpl
e

26
Dec
ode
° Opcode and func. field are decoded to produce other control
signals
° Execution of an ALU instruction (ADD $3,$1,$2) requires
reading 2 register values and writing the result to a third
° REGWrite is an enable signal indicating the write data should
be written to the specified register

27
ALU
Dat
apa
° ALU takes inputs from register file and performs
theth
add, sub, and, or, slt operations
° Result is written back to dest. register

28
Me
mor
° Operands are read from y register file while offset is sign
extended Acc
° ess
ALU calculates effective address
Dat
° apa
Memory access is performed
° th back to register
If LW, read data is written

29
Me
mor
° LW and SW yrequire:
Acc
• Sign extension
ess unit for address offset
• ALU to compute (add) base address + offset
• Data memory

30
Bra
nch
Datrequires…
° BEQ
•apa
ALU for comparison (examine ‘zero’ output)
th
• Sign extension unit for branch offset
• Adder to add PC and offset
- Need a separate adder since ALU is used to perform
comparison

31
Co
mbi
nindatapaths into one
° Combine all
g
Dat
apa
We can have
ths multiple options for certain
inputs

° Use muxes to select appropriate input for a


given instruction

° Generate control bits to control mux

32
ALU
Src
Mux
° Mux controlling second input to ALU
• ALU instruction provides Read Register 2 data to the 2nd input
of ALU
• LW/SW uses 2nd input of ALU as an offset to form effective
address

33
Me
mto
Reg
° Mux controlling writeback value to register file
•Mux
ALU instructions use the result of the ALU
• LW uses the read data from data memory

34
PCS
rc
Mux
° Next instruction can either be PC+4, or the branch
target address, PC+Offset

35
Reg
Dst
Mux destination register ID fields for ALU and LW
° Different
instructions

36
Sin
gle
Cyc
le
CP
U
Dat
apa
th

37
Con
trol
° We now have data path in place, but how do we
control which path an instruction will take?
° Single-Cycle Design:
• Instruction takes exactly one clock cycle
• Datapath units used only once per cycle
• Writable state updated at end of cycle

° What must be “controlled”?


• Muxes
• Writeable state elements: RF, Mem
• ALU

38
Con
trol
° Single-Cycle Design: everything
happens in one clock cycle
• until next falling edge of clock,
processor just one big combinational
circuit!!!
• control is just a combinational circuit
(output, just function of inputs)
° outputs? control points in datapath
° inputs? the current instruction!
(opcode, funct control everything)
39
Dat
apa
th +
Con
trol

40
Defi
nin
° Most g control signals are a function of the opcode
Con
(i.e. LW/SW, R-Type, Branch, Jump)
trol

41
Defi
nin
g field only present in R-type instruction
° Funct
•Con
Funct controls ALU only
trol
° To simplify control, define Main and ALU control
separately

42
ALU
Con
trol
° ALU Control needs to know what instruction type it is:
• R-Type (op. depends on func. code)
• LW/SW (op. = ADD)
• BEQ (op. = SUB)
° Let main control unit produce ALUOp[1:0] to indicate
instruction type, then use function bits if necessary to tell the
ALU what to do

43
ALU
Con
trol
ALUcon

A
A Zero
L
U Result
B Note: We don’t use NOR.
Ignore MSB in ALUCon.
ALUCon ALU function Instruction supported
0000 AND R-format (AND)
0001 OR R-format (OR)
0010 Add R-format (Add), lw, sw
0110 Subtract R-format (Sub), beq
0111 Set on less than R-format (Slt)
1100 NOR R-format (Nor)

44
ALU
Con
° ALUControl[2:0] istrol
a function of ALUOp[1:0] and Func[5:0]
Trut
h
Tabl
Instruc. Instruction e
Desired ALUOp[1:0] Func[5:0] ALUControl
Operation ALU Action
LW Load Word Add 00 X 010
SW Store Word Add 00 X 010
Branch BEQ Subtract 01 X 110
R-Type AND And 10 100100 000
R-Type OR Or 10 100101 001
R-Type Add Add 10 100000 010
R-Type Sub Subtract 10 100010 110
R-Type SLT Set on less 10 101010 111
than

45
Simplified ALUControl
Truth Table
° We can simplify using don’t cares
° Can turn into gates
Funct Field ALUCont.

ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
X 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111

° ALUControl2 = ALUOp0 OR (ALUOp1 AND F1)


° ALUControl1 = ALUOp1 NAND F2 any of them 0.
° ALUControl0 = ALUOp1 AND (F3 OR F0)

46
Dat
apa
th +
Con
trol

48
Mai
n
Con are function of opcode
° Main control signals
trol
Sig
nals

Could generate each control signal Simpler for humans to design if we decode
by writing full truth table of the 6-bit opcode the opcode and then use instruction signals
to generate desired control signals

49
The “Truth Table” for the Main
Control
RegDst
func
ALUSrc ALU ALUctr
op Main 6
6
:
Control ALUop Control 3
(Local)
3

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010


R-type ori lw sw beq jump
RegDst 1 0 0 x x x
ALUSrc 0 1 1 1 0 x
MemtoReg 0 0 1 x x x
RegWrite 1 1 1 0 0 0
MemWrite 0 0 0 1 0 0
Branch 0 0 0 0 1 0
Jump 0 0 0 0 0 1
ExtOp x 0 1 1 x x
ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop <2> 1 0 0 0 0 x
ALUop <1> 0 1 0 0 0 x
ALUop <0> 0 0 0 0 1 x
Mai
n
Con
trol
Sig
nal
Log
ic

51
Rev
iew
– R-
typ PCSrc

Add
e 1
M
u
4
Inst ALU
Add result
x
0
ruct RegWrite Shift
left 2

Instruction [25– 21]


ion Read
PC Read
address Instruction [20– 16]
s register 1
Read
Read
data 1
MemWrite
ALUSrc MemtoReg
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp
Rev
iew

Lw PCSrc

Add
Inst 1
M
u
4 ruct ALU
Add result
x
0

ion RegWrite Shift


left 2

Instruction [25– 21] Read


Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp
Review – Branch Instructions

PCSrc

1
Add M
u
x
4 ALU 0
Add result
RegWrite Shift
left 2

Instruction [25– 21] Read


Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp
Jump Instruction Implementation

What changes are


needed to support
JAL?

55
Ju
mp
Inst
° JAL (Jump and Link) for function calls
ruct
° Howiondo we add JAL to data path?
• Place PC+4 (return address) into $ra by
• 1. Extend RegDst mux to include 31 ($ra)
• 2. Extend MemtoReg mux at write data input to have PC+4
as input
Executing this instruction requires Setting control signals
accordingly to execute the jump and write pc+4 to Ra.

56
Clocking Methodology - Negative Edge Triggered
Clk
Setup Hold Setup Hold
Don’t Care

. . . .
. . . .
. . . .

° All storage elements are clocked by the same clock edge


° Cycle Time = hold time + Longest Delay Path + Setup + Clock
Skew
Computing Clock Cycle Time
Assume The following component access time : Memory 20 ns.,
Reg File 5 ns, ALU 10ns. Adders 2 ns. Control 3 ns and signExt
1 ns. Muxes =0 ns.

How many Nano seconds each instruction need:


Add, ORI, SW, BEQ, and LW

ADD needs 40 ns (IM + RF + ALU + RF) = 20 +5 +10 +5


ORI needs 40 ns (IM +RF + ALU +RF) = 20 +5 +10 +5
SW needs 55 ns (IM +RF+ALU+DM) = 20+5+10+20
BEQ needs 35, (IM + RF +ALU) = 20 +5 +10
LW needs 60ns (IM + RF + ALU + DM+ RF) = 20+5+10+20 +5
Therefore LW determines the clock CycleTtime which must be at
least 60ns for the worst instruction.
An abstract view of the critical path - load instruction
Critical Path (Load Operation) =
PC’s Clk-to-Q +
Ideal Instruction Memory’s Access Time +
Instruction Register File’s Access Time +
Instruction ALU to Perform a 32-bit Add +
Memory
Rd Rs Rt Imm Data Memory Access Time +
5 5 5 16 Setup Time for Register File Write +
Instruction Clock Skew
Address
A Data
Rw Ra Rb 32
Next Address

32 Address
32 Ideal
32 32-bit

ALU
Data
PC

Registers Data In Memory


B

Clk Clk
32
Clk

Worst case delay for load is much longer than needed for all
other instructions, yet this sets the cycle time.
Summary
° 5 steps to design a processor
• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components & establish clock
methodology
• 3. Design datapath meeting the requirements
• 4. Analyze implementation of each instruction to determine setting
of control points that effects the register transfer.
• 5. Design the control logic
° MIPS makes it easier
• Instructions same size
• Source registers always in same place
• Immediates same size, location
• Operations always on registers/immediates
° Single cycle datapath => CPI=1, CCT => long

You might also like