You are on page 1of 48

Digital Design &

Computer Architecture
Sarah Harris & David Harris

Chapter 7:
Microarchitecture
Chapter 7 :: Topics
• Introduction
• Performance Analysis
• Single­Cycle Processor
• Multicycle Processor
• Pipelined Processor
• Advanced Microarchitecture

2 Digital Design & Computer Architecture Microarchitecture


Introduction
• Microarchitecture: how to
implement an architecture in
hardware
• Processor:
– Datapath: functional blocks
– Control: control signals

3 Digital Design & Computer Architecture Microarchitecture


Microarchitecture
• Multiple implementations for a single
architecture:
– Single­cycle: Each instruction executes in a
single cycle
– Multicycle: Each instruction is broken up into
series of shorter steps
– Pipelined: Each instruction broken up into
series of steps & multiple instructions
execute at once

4 Digital Design & Computer Architecture Microarchitecture


Processor Performance
• Program execution time
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)

• Definitions:
– CPI: Cycles/instruction
– clock period: seconds/cycle
– IPC: instructions/cycle = IPC
• Challenge is to satisfy constraints of:
– Cost
– Power
– Performance
5 Digital Design & Computer Architecture Microarchitecture
RISC­V Processor
• Consider subset of RISC­V instructions:
– R­type ALU instructions:
• add, sub, and, or, slt
– Memory instructions:
• lw, sw
– Branch instructions:
• beq

6 Digital Design & Computer Architecture Microarchitecture


Architectural State Elements
Determines everything about a processor:
– Architectural state:
• 32 registers
• PC
• Memory

7 Digital Design & Computer Architecture Microarchitecture


RISC­V Architectural State Elements

CLK CLK
CLK
WE3 WE
PCNext PC A1 RD1
A RD 5 32
32 32 32 32
A RD
Instruction 32 32
A2 RD2 Data
Memory 5 32

5
A3 Memory
Register
WD3 WD
32 File 32

8 Digital Design & Computer Architecture Microarchitecture


Chapter 7: Microarchitecture

Single­Cycle
RISC­V Processor
Single­Cycle RISC­V Processor
• Datapath
• Control

10 Digital Design & Computer Architecture Microarchitecture


Example Program
• Design datapath
• View example program executing

Example Program:
Address Instruction Type Fields Machine Language
imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303
imm11:5 rs2 rs1 f3 imm4:0 op
0x1004 sw x6, 8(x9) S 0000000 00110 01001 010 01000 0100011 0064A423
funct7 rs2 rs1 f3 rd op
0x1008 or x4, x5, x6 R 0000000 00110 00101 110 00100 0110011 0062E233
imm12,10:5 rs2 rs1 f3 imm4:1,11 op
0x100C beq x4, x4, L7 B 1111111 00100 00100 000 10101 1100011 FE420AE3

11 Digital Design & Computer Architecture Microarchitecture


Single­Cycle RISC­V Processor
• Datapath: start with lw instruction
• Example: lw x6, -4(x9)
lw rd, imm(rs1)

I-Type
31:20 19:15 14:12 11:7 6:0
imm11:0 rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

12 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: lw fetch
STEP 1: Fetch instruction

CLK CLK
CLK
WE3 WE
PCNext PC Instr A1 RD1
A RD
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303

13 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: lw Reg Read
STEP 2: Read source operand (rs1) from RF

CLK CLK
CLK
19:15 WE3 0x2004 WE
PCNext PC Instr A1 RD1
A RD 9
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303

14 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: lw Immediate
STEP 3: Extend the immediate

CLK CLK
CLK
19:15 WE3 0x2004 WE
PCNext PC Instr A1 RD1
A RD 9
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

ImmExt
31:20
Extend 0xFFFFFFFC
0xFFC

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303

15 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: lw Address
ALUControl2:0 Function

STEP 4: Compute the memory 000


001
add
subtract

address 010 and


011 or
ALUControl2:0
000 101 SLT
CLK CLK
CLK
19:15 WE3 0x2004 SrcA WE
PCNext PC Instr A1 RD1
A RD 9
ALUResult

ALU
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 SrcB 0x00002000 Data
Memory
A3 Memory
Register
WD3 WD
File

ImmExt
31:20
Extend 0xFFFFFFFC
0xFFC

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303

16 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: lw Mem Read
STEP 5: Read data from memory and write it
back to register file
RegWrite ALUControl2:0
. 000
CLK CLK
CLK
19:15 WE3 0x2004 SrcA WE
PCNext PC Instr A1 RD1
A RD 9 ReadData
ALUResult

ALU
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 SrcB 0x00002000 Data 10
Memory 11:7
A3 Memory
6 Register
WD3 WD
File

ImmExt
31:20
Extend 0xFFFFFFFC
0xFFC

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303

17 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: PC Increment
STEP 6: Determine address of next instruction

RegWrite ALUControl2:0
1 000
CLK CLK
CLK
19:15 WE3 0x2004 SrcA WE
PCNext PC Instr A1 RD1
A RD 9 ReadData
ALUResult

ALU
0x1004
0x1000

0xFFC4A303

A RD
Instruction
A2 RD2 SrcB 0x00002000 Data 10
Memory 11:7
A3 Memory
6 Register
WD3 WD
File

ImmExt
31:7
Extend 0xFFFFFFFC
PCPlus4 0xFFC
+
+

Address Instruction Type Fields Machine Language


imm11:0 rs1 f3 rd op
0x1000 L7: lw x6, -4(x9) I 111111111100 01001 010 00110 0000011 FFC4A303
18 Digital Design & Computer Architecture Microarchitecture
Chapter 7: Microarchitecture

Single­Cycle
Datapath: Other
Instructions
Single­Cycle Datapath: sw
• Immediate: now in {instr[31:25], instr[11:7]}
• Add control signals: ImmSrc, MemWrite
RegWrite ImmSrc ALUControl2:0 MemWrite
0 1 000 1
CLK CLK
CLK
19:15 WE3 0x2004 SrcA WE
PCNext PC Instr A1 RD1
A RD 9 ReadData
ALUResult

ALU
0x1008
0x1004

0x0064A423

10 A RD
Instruction 24:20
A2 RD2 SrcB 0x0000200C Data
Memory 11:7 6
A3 Memory
Register WriteData
WD3 WD
File

ImmExt
31:7
Extend 0x00000008
PCPlus4 0x008
+
+

Address Instruction Type Fields Machine Language


imm11:5 rs2 rs1 f3 imm4:0 op
0x1004 sw x6, 8(x9) S 0000000 00110 01001 010 01000 0100011 0064A423

20 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: Immediate
ImmSrc ImmExt Instruction Type
0 {{20{instr[31]}}, instr[31:20]} I­Type
1 {{20{instr[31]}}, instr[31:25], instr[11:7]} S­Type

I-Type
31:20 19:15 14:12 11:7 6:0
imm11:0 rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

S-Type
31:25 24:20 19:15 14:12 11:7 6:0
imm11:5 rs2 rs1 funct3 imm4:0 op
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

21 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: R­type
• Read from rs1 and rs2 (instead of imm)
• Write ALUResult to rd
RegWrite ImmSrc ALUSrc ALUControl2:0 MemWrite ResultSrc
1 x 0 011 0 0
CLK CLK 14
CLK
19:15 WE3 6 SrcA WE
PCNext PC Instr A1 RD1 0
A RD 5 ReadData
ALUResult

ALU
0x100C 14
0x1008

0x0062E233

10 A RD 1
Instruction 24:20 14
A2 RD2 0 SrcB Data
Memory 11:7 6
A3 1 10 Memory
4 Register WriteData
WD3 WD
14 File

ImmExt
31:7
Extend 0x00000008
PCPlus4
+
+

Result
4

Address Instruction Type Fields Machine Language


funct7 rs2 rs1 f3 rd op
0x1008 or x4, x5, x6 R 0000000 00110 00101 110 00100 0110011 0062E233

22 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Datapath: beq
Calculate target address: PCTarget = PC + imm

PCSrc RegWrite ImmSrc ALUSrc ALUControl2:0 MemWrite ResultSrc


1 0 10 0 001 0 x
CLK CLK
CLK
19:15 WE3 14 SrcA Zero WE
0 PCNext PC Instr A1 RD1 1 0
A RD 4 ReadData
ALUResult

ALU
1 0x1000
0x100C

0xFE420AE3

14 A RD 1
Instruction 24:20
A2 RD2 0 SrcB 0 Data
Memory 11:7 4
A3 1 14 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt 0x1000
31:7
Extend 0xFFFFFFF4
PCPlus4 0xFFA
+

Result
4
0x1010

Address Instruction Type Fields Machine Language


imm12,10:5 rs2 rs1 f3 imm4:1,11 op
0x100C beq x4, x4, L7 B 1111111 00100 00100 000 10101 1100011 FE420AE3
23 Digital Design & Computer Architecture Microarchitecture
Single­Cycle Datapath: ImmExt
ImmSrc1:0 ImmExt Instruction Type
00 {{20{instr[31]}}, instr[31:20]} I­Type
01 {{20{instr[31]}}, instr[31:25], instr[11:7]} S­Type
10 {{19{instr[31]}}, instr[31], instr[7], B­Type
instr[30:25], instr[11:8], 1’b0}
I-Type
31:20 19:15 14:12 11:7 6:0
imm11:0 rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

S-Type
31:25 24:20 19:15 14:12 11:7 6:0
imm11:5 rs2 rs1 funct3 imm4:0 op
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

B-Type
31:25 24:20 19:15 14:12 11:7 6:0
imm12,10:5 rs2 rs1 funct3 imm4:1,11 op
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

24 Digital Design & Computer Architecture Microarchitecture


Single­Cycle RISC­V Processor
PCSrc
Control
ResultSrc
Unit
MemWrite
6:0
op ALUControl2:0
14:12
funct3 ALUSrc
30
funct75 ImmSrc1:0
Zero RegWrite

CLK CLK
CLK
19:15 WE3 SrcA Zero WE
0 PCNext PC Instr A1 RD1 0
A RD ReadData
ALUResult

ALU
1 A RD 1
Instruction 24:20
A2 RD2 0 SrcB Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt
31:7
Extend
PCPlus4
+

Result
4

25 Digital Design & Computer Architecture Microarchitecture


Chapter 7: Microarchitecture

Single­Cycle
Control
Single­Cycle Control
High­Level View Low­Level View
Zero PCSrc
Branch
PCSrc
Control
ResultSrc ResultSrc
Unit
MemWrite Main MemWrite
Instr 6:0
op6:0 Decoder ALUSrc
op ALUControl2:0 5
14:12 ImmSrc1:0
funct3 ALUSrc
30 RegWrite
funct75 ImmSrc1:0
Zero Zero RegWrite ALUOp1:0

ALU
funct32:0 ALUControl2:0
Decoder
funct75

27 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Control: Main Decoder
op Instr. RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp

3 lw 1 00 1 0 1 0 00

35 sw 0 01 1 1 X 0 00

51 R­type 1 XX 0 0 0 0 10

99 beq 0 10 0 0 X 1 01

Zero PCSrc
Branch

ResultSrc
Main MemWrite
op6:0 Decoder ALUSrc
5
ImmSrc1:0
RegWrite

ALUOp1:0

ALU
funct32:0 ALUControl2:0
Decoder
funct75

28 Digital Design & Computer Architecture Microarchitecture


Review: ALU
ALUControl2:0 Function A B
N N
000 add
001 subtract
3 ALUControl
010 and ALU
011 or N
Result
101 SLT

29 Digital Design & Computer Architecture Microarchitecture


Review: ALU
A B
ALUControl2:0 Function N N

000 add
001 subtract
N
010 and

ALUControl0
011 or N

Cout
101 SLT +
[N-1] Sum

ZeroExt N N N N N
101 011 010 001 000
3 ALUControl
N
Result

30 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Control: ALU Decoder
Zero PCSrc
Branch

ResultSrc
Main MemWrite
op6:0 Decoder ALUSrc
5
ImmSrc1:0
RegWrite

ALUOp1:0

ALU
funct32:0 ALUControl2:0
Decoder
funct75

31 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Control: ALU Decoder
ALUOp funct3 op5 , funct75 Instruction ALUControl2:0
00 x x lw, sw 000 (add)
01 x x beq 001 (subtract)
10 000 00, 01, 10 add 000 (add)
000 11 sub 001 (subtract)
010 x slt 101 (set less than)
110 x or 011 (or)
111 x Slt 010 (and)

ALUOp1:0

op5
ALU
funct32:0 ALUControl2:0
Decoder
funct75

32 Digital Design & Computer Architecture Microarchitecture


Example: and
op Instruct RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp

51 R­type 1 XX 0 0 0 0 10

PCSrc
Control
ResultSrc
Unit
MemWrite
6:0
op ALUControl2:0
14:12
funct3 ALUSrc
30
funct75 ImmSrc1:0
Zero RegWrite

CLK CLK
0 CLK 1 xx 0 010 0 0
19:15 WE3 SrcA Zero WE
0 PCNext PC Instr A1 RD1 0
A RD ReadData
ALUResult

ALU
1 A RD 1
Instruction 24:20
A2 RD2 0 SrcB Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt
31:7
Extend
PCPlus4
+

Result
4

and x5, x6, x7


33 Digital Design & Computer Architecture Microarchitecture
Chapter 7: Microarchitecture

Extending the
Single­Cycle
Processor
Extended Functionality: I­Type ALU
Enhance the single­cycle processor to handle I­Type
ALU instructions: addi, andi, ori, and slti
• Similar to R­type instructions
• But second source comes from immediate
• Change ALUSrc to select the immediate
• And ImmSrc to pick the correct immediate

35 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: I­Type ALU
op Instruct. RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp

3 lw 1 00 1 0 1 0 00

35 sw 0 01 1 1 X 0 00

51 R­type 1 XX 0 0 0 0 10

99 beq 0 10 0 0 X 1 01

19 I­type 1 00 1 0 0 0 10

36 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: addi
op Instruct. RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp

19 I­type 1 00 1 0 0 0 10

PCSrc
Control
ResultSrc
Unit
MemWrite
6:0
op ALUControl2:0
14:12
funct3 ALUSrc
30
funct75 ImmSrc1:0
Zero RegWrite

CLK CLK
0 CLK 1 00 1 000 0 0
19:15 WE3 SrcA Zero WE
0 PCNext PC Instr A1 RD1 0
A RD ReadData
ALUResult

ALU
1 A RD 1
Instruction 24:20
A2 RD2 0 SrcB Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt
31:7
Extend
PCPlus4
+

Result
4

addi x5, x6, -33


37 Digital Design & Computer Architecture Microarchitecture
Extended Functionality: jal
Enhance the single­cycle processor to handle jal
• Similar to beq
• But jump is always taken
– PCSrc should be 1
• Immediate format is different
– Need a new ImmSrc of 11
• And jal must compute PC+4 and store in rd
– Take PC+4 from adder through ResultMux

38 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: jal
Zero
Branch PCSrc
Jump

ResultSrc1:0
Main MemWrite
Decoder
op6:0 ALUSrc
5
ImmSrc1:0
RegWrite

ALUOp1:0

ALU
funct32:0 ALUControl2:0
Decoder
funct75

39 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: ImmExt
ImmSrc1:0 ImmExt Instruction Type
00 {{20{instr[31]}}, instr[31:20]} I­Type
01 {{20{instr[31]}}, instr[31:25], instr[11:7]} S­Type
10 {{19{instr[31]}}, instr[31], instr[7], instr[30:25], B­Type
instr[11:8], 1’b0}
11 {{12{instr[31]}}, instr[19:12], instr[20], J­Type
instr[30:21], 1’b0}

I-Type B-Type
31:20 19:15 14:12 11:7 6:0 31:25 24:20 19:15 14:12 11:7 6:0
imm11:0 rs1 funct3 rd op imm12,10:5 rs2 rs1 funct3 imm4:1,11 op
12 bits 5 bits 3 bits 5 bits 7 bits 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

S-Type J-Type
31:25 24:20 19:15 14:12 11:7 6:0 31:12 11:7 6:0

imm11:5 rs2 rs1 funct3 imm4:0 op imm20,10:1,11,19:12 rd op


7 bits 5 bits 5 bits 3 bits 5 bits 7 bits 20 bits 5 bits 7 bits

40 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: jal
op Instruct. RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp Jump
3 lw 1 00 1 0 10 0 00 0
35 sw 0 01 1 1 XX 0 00 0
51 R­type 1 XX 0 0 01 0 10 0
99 beq 0 10 0 0 XX 1 01 0
19 I­type 1 00 1 0 01 0 10 0
111 jal 1 11 X 0 10 0 XX 1

41 Digital Design & Computer Architecture Microarchitecture


Extended Functionality: jal
op Instruct. RegWrite ImmSrc ALUSrc MemWrite ResultSrc Branch ALUOp Jump
111 jal 1 11 X 0 10 0 XX 1

PCSrc
Control
ResultSrc1:0
Unit
MemWrite
6:0
op ALUControl2:0
14:12
funct3 ALUSrc
30
funct75 ImmSrc1:0
Zero RegWrite

CLK CLK
CLK
19:15 WE3 SrcA Zero WE
0 PCNext PC Instr A1 RD1
A RD ReadData 00
ALUResult

ALU
1 A RD 01
Instruction 24:20 10
A2 RD2 0 SrcB Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt
31:7
Extend
PCPlus4
+

Result
4

42 Digital Design & Computer Architecture Microarchitecture


Chapter 7: Microarchitecture

Single­Cycle
Performance
Processor Performance
Program Execution Time
= (#instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC

44 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Processor Performance
PCSrc
Control
ResultSrc1:0
Unit
MemWrite
6:0
op ALUControl2:0
14:12
funct3 ALUSrc
30
funct75 ImmSrc1:0
Zero RegWrite

CLK CLK
CLK
19:15 WE3 SrcA Zero WE
0 PCNext PC Instr A1 RD1
A RD ReadData 00
ALUResult

ALU
1 A RD 01
Instruction 24:20 10
A2 RD2 0 SrcB Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File

PCTarget

+
ImmExt
31:7
Extend
PCPlus4
+

Result
4

TC limited by critical path (lw)


45 Digital Design & Computer Architecture Microarchitecture
Single­Cycle Processor Performance
• Single­cycle critical path:
Tc_single = t pcq_PC + t mem + max[t RFread, t dec + t ext + t mux] + t ALU + t mem + t mux + t RFsetup

• Typically, limiting paths are:


– memory, ALU, register file
– So, Tc_single = tpcq_PC + tmem + tRFread + tALU + tmem + tmux + tRFsetup
= tpcq_PC + 2tmem+ tRFread + tALU + tmux + tRFsetup

46 Digital Design & Computer Architecture Microarchitecture


Single­Cycle Performance Example
Element Parameter Delay (ps)
Register clock­to­Q tpcq_PC 40
Register setup tsetup 50
Multiplexer tmux 30
AND­OR gate tAND-OR 20
ALU tALU 120
Decoder (Control Unit) tdec 25
Extend unit text 35
Memory read tmem 200
Register file read tRFread 100
Register file setup tRFsetup 60
Tc_single = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup
= (40 + 2*200 + 100 + 120 + 30 + 60) ps = 750 ps
47 Digital Design & Computer Architecture Microarchitecture
Single­Cycle Performance Example
Program with 100 billion instructions:

Execution Time = # instructions x CPI x TC


= (100 × 109)(1)(750 × 10­12 s)
= 75 seconds

48 Digital Design & Computer Architecture Microarchitecture

You might also like