You are on page 1of 8

3/1/2015

Lab #3
 Lab #3 has been posted
 Due Friday, March 6 by 6pm (demo and report)
 Requires downloading MARS, a MIPS simulator
Lecture 16, 17 & 18: Single Cycle Processor Design
 Remember, this lab is to be done alone. All code should be
March 2, 4, 6, 2015 written by you alone.
Prof. R. Iris Bahar

© 2015 R.I. Bahar


Portions of these slides taken from Professors S. Reda
and D. Patterson 2

Processor organization (microarchitecture) Introduction


 Multiple implementations for a
 CPU performance factors
single architecture:
 Instruction count (Determined by ISA and compiler)
 Single-cycle Application
Algorithm  CPI and Cycle time (Determined by CPU hardware)
 Each instruction executes in a single cycle

 Multi-cycle
Programming Language  We will examine a number of MIPS implementations
Operating System/Virtual Machine
 Each instruction is broken up into a series  A simplified single-cycle version
Instruction Set Architecture (ISA)
of shorter steps
Microarchitecture  A more realistic pipelined version
 Pipelined
Register-Transfer Level (RTL)  Simple subset, shows most aspects
 Each instruction is broken up into a series
Circuits
of steps  Memory reference: lw, sw
Devices
 Multiple instructions execute at once.  Arithmetic/logical: add, sub, and, or, slt
Physics
 Superscalar  Control transfer: beq, j
 Multiple instructions fetched, decoded
and executed simultaneously. 3 4

1
3/1/2015

Instruction Execution Single-Cycle MIPS Processor


Fetch instruction @ PC
 PC  instruction memory, fetch instruction  Datapath
 Register numbers  register file, read registers  Control Decode instruction

 Depending on instruction class


Fetch Operands
 Use ALU to calculate
 Arithmetic result
Execute instruction
 Memory address for load/store

 Branch target address


Store result
 Access data memory for load/store
 PC  target address or PC + 4
Update PC

5 6

Architectural state Single-Cycle Datapath: fetch


 Determines everything about a processor: First consider executing lw
 PC
lw $s1, 4($s2)
 32 registers
 STEP 1: Fetch instruction
 Memory
CLK CLK CLK
 How would fetch differ for other instructions?
PC' PC WE3 WE
32 32 5
A1 RD1 32 CLK
CLK CLK
32
A RD 32
5
A2 RD2 32 A RD PC Instr WE3 WE
Instruction 32 32 PC' A1 RD1
A RD
Memory Data A RD
Instruction
A3 Memory A2 RD2 Data
5 Memory
Register A3 Memory
WD3 WD Register
32 File 32 WD3
File
WD

8 9

2
3/1/2015

Single-Cycle Datapath: register read Single-Cycle Datapath: immediate


lw $s1, 4($s2)
lw $s1, 4($s2)
 STEP 3: Sign-extend the immediate
 STEP 2: Read source operands from register file

CLK CLK
CLK
WE3 WE
CLK CLK PC' PC Instr
25:21
A1 RD1
CLK A RD
25:21
WE3 WE A RD
A1 RD1 Instruction
PC' PC Instr A2 RD2 Data
A RD Memory
A RD A3 Memory
Instruction Register
A2 RD2 Data WD3 WD
Memory File
A3 Memory
Register
WD3 WD
File

15:0 SignImm
Sign Extend

10 11

Single-Cycle Datapath: generate address Single-Cycle Datapath: memory read


lw $s1, 4($s2) lw $s1, 4($s2)
 STEP 4: Compute the memory address  STEP 5: Read data from memory and write it
back to the register file
ALUControl2:0
RegWrite ALUControl2:0
010 1 010
CLK CLK
CLK CLK CLK
CLK
25:21
WE3 SrcA Zero WE WE3 Zero WE
PC' PC Instr A1 RD1 25:21 SrcA
PC' PC Instr A1 RD1
A RD
ALU

ALUResult A RD

ALU
ALUResult ReadData
A RD A RD
Instruction Instruction
A2 RD2
A2 RD2 SrcB Data Memory SrcB Data
Memory 20:16

A3 Memory A3 Memory
Register
Register WD3 WD
WD3 WD File
File

SignImm
15:0
SignImm Sign Extend
15:0
Sign Extend

12 13

3
3/1/2015

Single-Cycle Datapath: PC increment Single-Cycle Datapath: sw


lw $s1, 4($s2) sw $s1, 4($s2)
 STEP 6: Determine the address of the next  All steps are the same as lw except for STEP 5
instruction  STEP 5: Write data in rt to memory
RegWrite ALUControl2:0 RegWrite ALUControl2:0 MemWrite
1 010 0 010 1
CLK CLK CLK CLK
CLK CLK
25:21
WE3 SrcA Zero WE 25:21
WE3 SrcA Zero WE
PC' PC Instr A1 RD1 PC' PC Instr A1 RD1
A RD A RD

ALU

ALU
ALUResult ReadData ALUResult ReadData
A RD A RD
Instruction Instruction 20:16
A2 RD2 SrcB Data A2 RD2 SrcB
Memory Memory Data
20:16
A3 Memory 20:16
A3 Memory
Register Register WriteData
WD3 WD WD3 WD
File File

PCPlus4 PCPlus4
+

+
SignImm SignImm
4 15:0
Sign Extend 4 15:0
Sign Extend

Result
Result

14 15

Single-Cycle Datapath: R-type instructions Single-Cycle Datapath: beq


beq $s1, $s2, TARGET_ADDR
and $s1, $s2, $s3
 STEP 3: sign-extend TARGET_ADDR (in immediate field)
 STEP 2: Read operands from rs and rt  STEP 4a: Determine if (rs==rt) (subtract and check if results is 0)
 STEP 5: Write ALU result into register file  STEP 4b: Calculate branch target address:
 Write to rd (instead of rt)  BTA = (TARGET_ADDR << 2) + (PC+4)
 STEP 6: update PC with next address
RegWrite RegDst ALUSrc ALUControl2:0 MemWrite MemtoReg
PCSrc
1 1 0 varies 0
CLK CLK 0
CLK RegWrite RegDst ALUSrc ALUControl2:0 Branch MemWrite MemtoReg
25:21
WE3 SrcA Zero WE 0 x 0 110 1 x
PC' PC Instr A1 RD1 0 CLK CLK 0
A RD
ALU

ALUResult ReadData CLK


A RD 1 WE3 SrcA Zero WE
Instruction PC' 25:21
20:16
A2 RD2 0 SrcB Data
0 PC
A RD
Instr A1 RD1 0
Memory

ALU
1 ALUResult ReadData
A3 1 Memory A RD 1
Register WriteData Instruction 20:16
WD3 WD A2 RD2 0 SrcB Data
File Memory
A3 1 Memory
20:16 Register WriteData
0 WD3 WD
15:11 File
1 20:16
WriteReg4:0 0
PCPlus4 15:11
+

1
SignImm WriteReg4:0
4 15:0 PCPlus4
+

Sign Extend
SignImm
4 15:0
<<2
Sign Extend PCBranch

+
Result
Result
16 17

4
3/1/2015

Complete Single Cycle Processor ALU Control


 ALU used for
 Load/Store: F = add
 Branch: F = subtract
 R-type: F depends on funct field

ALU control Function


0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR

18 19

ALU Control The Main Control Unit


 Control signals derived from instruction
 Assume 2-bit ALUOp derived from opcode
 Combinational logic derives ALU control
R-type 0 rs rt rd shamt funct
 Define additional ALU control encodings to expand it functionality
31:26 25:21 20:16 15:11 10:6 5:0
opcode ALUOp Operation funct ALU function ALU
Load/ 35 or 43 rs rt address
control
Store
lw 00 load word XXXXXX add 0010 31:26 25:21 20:16 15:0
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110 Branch 4 rs rt address
R-type 10 add 100000 add 0010 31:26 25:21 20:16 15:0
subtract 100010 subtract 0110
AND 100100 AND 0000 opcode always read, write for sign-extend
OR 100101 OR 0001 read except R-type and add
set-on-less-than 101010 set-on-less-than 0111 for load and load

20 21

5
3/1/2015

Single Cycle processor with Control Main decoder


Instruction Op5:0 RegWrite RegDst AluSrc Branch Mem-read MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 0 10

lw 100011 1 0 1 0 1 0 1 00

sw 101011 0 X 1 0 0 1 X 00

beq 000100 0 X 0 1 0 0 X 01

addi
001000 1 0 1 0 0 0 0 00

[without jumps]

22 23

Implementing Jumps Datapath and control with jumps


Jump 2 address
31:26 25:0

 Jump uses word address


 Update PC with concatenation of
 Top 4 bits of old PC
 26-bit jump address
 00
 Need an extra control signal decoded from opcode

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

j 000010 0 X X X 0 X XX 1
24
25

6
3/1/2015

Processor performance Critical path • TC is limited by the critical path (lw)


Program Execution Time Control
MemtoReg
MemWrite
Unit
Branch 0 0
= (# instructions)(cycles/instruction)(seconds/cycle) 31:26
ALUControl2:0 PCSrc
Op ALUSrc

= # instructions x CPI x TC
5:0
Funct RegDst
RegWrite

CLK CLK
CLK 1 0
010 1
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
CPI = 1 Instruction
Memory
20:16
A2 RD2
1
0 SrcB
A RD
Data
1

A3 1 Memory
Register WriteData
What is TC? 20:16
WD3
File
0
WD

0
15:11
1
WriteReg4:0
PCPlus4

+
SignImm
4 15:0 <<2
Sign Extend PCBranch

+
Result

• Single-cycle critical path:


Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup

26 27

Critical path delay Performance Issues


Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30  Longest delay determines clock period
Register setup tsetup 20  Critical path: load instruction

Multiplexer tmux 25  Instruction memory  register file  ALU  data memory


 register file
ALU tALU 200
Memory read tmem 250  Not feasible to vary period for different instructions
Register file read tRFread 150  Violates design principle
Register file setup tRFsetup 20  Making the common case fast
 Eventually, we will improve performance by pipelining
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 25 + 200 + 20] ps
= 925 ps
29
28

7
3/1/2015

Summary
 Single-cycle processor design is simple
 Plenty of room for improvement:
 Pipelining
 Superscalar
 In Lab#4 you will design, implement and boot your first
processor!

30

You might also like